License blobs
Dataset of the complete texts of free/open source software (FOSS) license variants.
- Dataset size
- 15.2 GiB
- Export date
- Derived of
- Full compressed graph [2022-04-25]
- SWH Annex URL
- https://annex.softwareheritage.org/public/dataset/license-blobs/2022-04-25/
- Deprecated
- False
Download the dataset
The HTTP links point to directories listing all available files.
wget --recursive --no-parent --reject "index.html*" https://annex.softwareheritage.org/public/dataset/license-blobs/2022-04-25/
Referencing the dataset
If you use this dataset for research purposes, please acknowledge Software Heritage as recommended in the publications page, which means doing the next two things:
- Add a footnote on the title page of your paper, formatted as: “This work was made possible by Software Heritage, the universal source code archive: https://www.softwareheritage.org”
-
Cite the following papers:
- Stefano Zacchiroli. A large-scale dataset of (open source) license text variants. In 19th IEEE/ACM International Conference on Mining Software Repositories, MSR 2022, Pittsburgh, PA, USA, May 23-24, 2022, 757–761. ACM, 2022. URL: https://doi.org/10.1145/3524842.3528491, doi:10.1145/3524842.3528491. (BibTeX)
- Jesús M. González-Barahona, Sergio Raúl Montes León, Gregorio Robles, and Stefano Zacchiroli. The software heritage license dataset (2022 edition). Empir. Softw. Eng., 28(6):147, 2023. URL: https://doi.org/10.1007/s10664-023-10377-w, doi:10.1007/S10664-023-10377-W. (BibTeX)