Graph export in CSV
This export has a CSV representation of nodes and edges instead of columnar.
- Comments
-
- edges as graph.edges.{cnt,ori,rel,rev,snp}.csv.zst and graph.edges.dir.{00..21}.csv.zst
- nodes as graph.nodes.csv.zst
- deduplicated labels as graph.labels.csv.zst
- statistics as graph.edges.count.txt, graph.edges.stats.txt, graph.labels.count.txt, graph.nodes.count.txt, and graph.nodes.stats.txt
- Dataset size
- 8.4 TiB
- Export date
- S3 URL
- s3://softwareheritage/graph/2020-12-15/csv/
- Deprecated
- True
Referencing Software Heritage
If you use any of the datasets indexed on this website for research purposes, please acknowledge Software Heritage as recommended in the publications page, that is:
- Add a footnote on the title page of your paper, formatted as: “This work was made possible by Software Heritage, the universal source code archive: https://www.softwareheritage.org”; and
-
cite at least one of the following papers:
- Roberto Di Cosmo, Stefano Zacchiroli. Software Heritage: Why and How to Preserve Software Source Code. iPRES 2017. (BibTeX)
- Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli. Building the universal archive of source code. Commun. ACM 61(10): 29-31 (2018). (BibTeX)
Specific datasets might recommend additional citations, to credit their creators.
Download the dataset
For Amazon S3 links, you'll need to install either awscli or swh.datasets.
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2020-12-15/csv/ 2020-12-15-csv