This is a compressed graph of only the "history and hosting" layer (origins, snapshots, releases, revisions) and the root directory (or rarely content) of every revision/release; but most directories and contents are excluded
If you use these datasets for research purposes, please cite the following paper:
Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli.
The Software Heritage Graph Dataset: Public software development under one roof.
In proceedings of MSR 2019: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Co-located with ICSE 2019.
preprint, bibtex