Home
Exports
Datasets based on the 2022-04-25 export
Datasets generated from an export of the graph dated from 2022-04-25
4
datasets
Dataset size
6.5 TiB
Export date
2022-04-25
Derived
datasets
Popular Contents
Path counts
S3 URL
s3://softwareheritage/graph/2022-04-25/compressed/
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2022-04-25/compressed/ 2022-04-25-compressed
# OR
swh datasets download-graph 2022-04-25
Dataset size
11 TiB
Export date
2022-04-25
S3 URL
s3://softwareheritage/graph/2022-04-25/orc/
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2022-04-25/orc/ 2022-04-25-orc
# OR
swh datasets download-export 2022-04-25
Dataset size
Unknown
Export date
2022-04-25
Derived of
Compressed graph [2022-04-25]
S3 URL
s3://softwareheritage/derived_datasets/2022-04-25/path_counts_forward_ori,snp,rel,rev,dir,cnt/
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/derived_datasets/2022-04-25/path_counts_forward_ori,snp,rel,rev,dir,cnt/ 2022-04-25-path_counts
Comments
A deprecated dataset listing the most popular name of each content. Replaced by the Aggregated Contents Dataset
Dataset size
Unknown
Export date
2022-04-25
Derived of
Compressed graph [2022-04-25]
S3 URL
s3://softwareheritage/derived_datasets/2022-04-25/popular_contents/
Download
For Amazon S3 links, you'll need to install either awscli or swh.datasets .
aws s3 cp --recursive --no-sign-request s3://softwareheritage/derived_datasets/2022-04-25/popular_contents/ 2022-04-25-popular_contents
deprecated
By accessing the datasets, you agree with the Software Heritage Ethical Charter for using the archive data , the terms of use for bulk access , and the Software Heritage principles for large language models .
To learn how to use the datasets read the documentation .
If you use these datasets for research purposes, please cite the following paper:
Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli.
The Software Heritage Graph Dataset: Public software development under one roof .
In proceedings of MSR 2019 : The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Co-located with ICSE 2019 .
preprint , bibtex
Software Heritage — Copyright (C) 2025, The Software Heritage developers.
Licenses: GNU AGPLv3+ (code) / Creative Commons Attribution 4.0 International license (datasets).