Graph export in columnar tables
A set of relational tables stored in a columnar format such as Apache ORC, which is particularly suited for scale-out analyses on data lakes and big data processing ecosystems such as the Hadoop environment.
- Dataset size
- 27 TiB
- Export date
- S3 URL
- s3://softwareheritage/graph/2025-05-18/orc/
- Deprecated
- False
Referencing Software Heritage
If you use any of the datasets indexed on this website for research purposes, please acknowledge Software Heritage as recommended in the publications page, that is:
- Add a footnote on the title page of your paper, formatted as: “This work was made possible by Software Heritage, the universal source code archive: https://www.softwareheritage.org”; and
-
cite at least one of the following papers:
- Roberto Di Cosmo, Stefano Zacchiroli. Software Heritage: Why and How to Preserve Software Source Code. iPRES 2017. (BibTeX)
- Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli. Building the universal archive of source code. Commun. ACM 61(10): 29-31 (2018). (BibTeX)
Specific datasets might recommend additional citations, to credit their creators.
Download the dataset
For Amazon S3 links, you'll need to install either awscli or swh.datasets.
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2025-05-18/orc/ 2025-05-18-orc# ORswh datasets download-export 2025-05-18