License blobs

Dataset of the complete texts of free/open source software (FOSS) license variants.

Dataset size
15.2 GiB
Export date
Derived of
Full compressed graph [2022-04-25]
SWH Annex URL
https://annex.softwareheritage.org/public/dataset/license-blobs/2022-04-25/
Deprecated
False

Download the dataset

The HTTP links point to directories listing all available files.

wget --recursive --no-parent --reject "index.html*" https://annex.softwareheritage.org/public/dataset/license-blobs/2022-04-25/

Referencing the dataset

If you use this dataset for research purposes, please acknowledge Software Heritage as recommended in the publications page, which means doing the next two things:

  1. Add a footnote on the title page of your paper, formatted as: “This work was made possible by Software Heritage, the universal source code archive: https://www.softwareheritage.org”
  2. Cite the following papers:

By accessing the datasets, you agree with the Software Heritage Ethical Charter for using the archive data, the terms of use for bulk access, and the Software Heritage principles for large language models.

To learn how to use the datasets read the documentation.