License blobs #
- Dataset size
- 30 GiB
- Export date
- Derived of
- Full compressed graph [2025-05-18]
- SWH Annex URL
- https://annex.softwareheritage.org/public/dataset/license-blobs/2025-05-18/
Download
The HTTP links point to directories listing all available files.
wget --recursive --no-parent --reject "index.html*" https://annex.softwareheritage.org/public/dataset/license-blobs/2025-05-18/
Referencing the dataset
If you use this dataset for research purposes, please acknowledge Software Heritage as recommended in the publications page, which means doing the next two things:
- Add a footnote on the title page of your paper, formatted as: âThis work was made possible by Software Heritage, the universal source code archive: https://www.softwareheritage.orgâ
-
Cite the following papers:
- Stefano Zacchiroli. A large-scale dataset of (open source) license text variants. In 19th IEEE/ACM International Conference on Mining Software Repositories, MSR 2022, Pittsburgh, PA, USA, May 23-24, 2022, 757â761. ACM, 2022. URL: https://doi.org/10.1145/3524842.3528491, doi:10.1145/3524842.3528491. (BibTeX)
- JesĂșs M. GonzĂĄlez-Barahona, Sergio RaĂșl Montes LeĂłn, Gregorio Robles, and Stefano Zacchiroli. The software heritage license dataset (2022 edition). Empir. Softw. Eng., 28(6):147, 2023. URL: https://doi.org/10.1007/s10664-023-10377-w, doi:10.1007/S10664-023-10377-W. (BibTeX)