Home
Graphs
Graph export in columnar tables
A set of relational tables stored in a columnar format such as Apache ORC, which is particularly suited for scale-out analyses on data lakes and big data processing ecosystems such as the Hadoop environment.
Referencing Software Heritage
If you use any of the datasets indexed on this website for research purposes, please acknowledge Software Heritage as recommended in the publications page , that is:
Add a footnote on the title page of your paper, formatted as: “This work was made possible by Software Heritage, the universal source code archive: https://www.softwareheritage.org”; and
cite at least one of the following papers:
Specific datasets might recommend additional citations, to credit their creators.
10
datasets
Dataset size
27 TiB
Export date
2025-05-18
S3 URL
s3://softwareheritage/graph/2025-05-18/orc/
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2025-05-18/orc/ 2025-05-18-orc
# OR
swh datasets download-export 2025-05-18
Dataset size
23 TiB
Export date
2024-12-06
S3 URL
s3://softwareheritage/graph/2024-12-06/orc/
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2024-12-06/orc/ 2024-12-06-orc
# OR
swh datasets download-export 2024-12-06
Dataset size
19 TiB
Export date
2024-08-23
Teaser
dataset
Popular 500 python columnar tables
(36 GB)
S3 URL
s3://softwareheritage/graph/2024-08-23/orc/
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2024-08-23/orc/ 2024-08-23-orc
# OR
swh datasets download-export 2024-08-23
Dataset size
18 TiB
Export date
2024-05-16
S3 URL
s3://softwareheritage/graph/2024-05-16/orc/
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2024-05-16/orc/ 2024-05-16-orc
# OR
swh datasets download-export 2024-05-16
Dataset size
18 TiB
Export date
2023-09-06
Teaser
dataset
Popular 1k columnar tables
(280 GB)
S3 URL
s3://softwareheritage/graph/2023-09-06/orc/
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2023-09-06/orc/ 2023-09-06-orc
# OR
swh datasets download-export 2023-09-06
Dataset size
13 TiB
Export date
2022-12-07
S3 URL
s3://softwareheritage/graph/2022-12-07/orc/
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2022-12-07/orc/ 2022-12-07-orc
# OR
swh datasets download-export 2022-12-07
Dataset size
11 TiB
Export date
2022-04-25
S3 URL
s3://softwareheritage/graph/2022-04-25/orc/
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2022-04-25/orc/ 2022-04-25-orc
# OR
swh datasets download-export 2022-04-25
Dataset size
8.4 TiB
Export date
2021-03-23
Teaser
dataset
Popular 3k python columnar tables
(36 GB)
S3 URL
s3://softwareheritage/graph/2021-03-23/orc/
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2021-03-23/orc/ 2021-03-23-orc
# OR
swh datasets download-export 2021-03-23
Comments
edges as graph.edges.{cnt,ori,rel,rev,snp}.csv.zst and graph.edges.dir.{00..21}.csv.zst
nodes as graph.nodes.csv.zst
deduplicated labels as graph.labels.csv.zst
statistics as graph.edges.count.txt, graph.edges.stats.txt, graph.labels.count.txt, graph.nodes.count.txt, and graph.nodes.stats.txt
Dataset size
8.4 TiB
Export date
2020-12-15
S3 URL
s3://softwareheritage/graph/2020-12-15/csv/
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2020-12-15/csv/ 2020-12-15-csv
deprecated
Comments
A full export of the graph dated from January 2019. The export was done in two phases, one of them called "2018-09-25" and the other "2019-01-28". They both refer to the same dataset, but the different formats have various inconsistencies between them.
Dataset size
1.2 TiB
Export date
2018-09-25
Teaser
dataset
Popular 4k columnar tables
(27 GB)
Popular 3k python columnar tables
(5.3 GB)
S3 URL
s3://softwareheritage/graph/2018-09-25/parquet/
SWH Annex URL
https://annex.softwareheritage.org/public/dataset/graph/2018-09-25/parquet/
Download
The HTTP links point to directories listing all available files.
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2018-09-25/parquet/ 2018-09-25-parquet
wget --recursive --no-parent --reject "index.html*" https://annex.softwareheritage.org/public/dataset/graph/2018-09-25/parquet/
deprecated
By accessing the datasets, you agree with the Software Heritage Ethical Charter for using the archive data , the terms of use for bulk access , and the Software Heritage principles for large language models .
To learn how to use the datasets read the documentation .
Software Heritage — Copyright (C) 2025, The Software Heritage developers.
Licenses: GNU AGPLv3+ (code) / Creative Commons Attribution 4.0 International license (datasets).