Datasets generated from an export of the graph dated from
Referencing Software Heritage
If you use any of the datasets indexed on this website for research purposes, please acknowledge Software Heritage as recommended in the publications page, that is:
Add a footnote on the title page of your paper, formatted as: “This work was made possible by Software Heritage, the universal source code archive: https://www.softwareheritage.org”; and
This graph changed the MPH from GOV/Cmph to PTHash; Rust code hardcoding GOVMPH needs to replace it with DynMph or SwhidPthash. Java is no longer supported to read this graph.
If you use this dataset for research purposes, please cite the following paper: Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli.
The Software Heritage Graph Dataset: Public software development under one roof. In proceedings of MSR 2019: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Co-located with ICSE 2019 (Preprint), (BibTeX).
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets.
This teaser contains a subset of the 443 repositories archived by Software Heritage as of 2024-08-23, among the 700 GitHub repositories tagged as being written in Python with the most stars.
This teaser contains a subset of the 443 repositories archived by Software Heritage as of 2024-08-23, among the 700 GitHub repositories tagged as being written in Python with the most stars.
If you use this dataset for research purposes, please cite the following paper: Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli.
The Software Heritage Graph Dataset: Public software development under one roof. In proceedings of MSR 2019: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Co-located with ICSE 2019 (Preprint), (BibTeX).
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets.
This teaser contains a subset of the 443 repositories archived by Software Heritage as of 2024-08-23, among the 700 GitHub repositories tagged as being written in Python with the most stars.