Popular 3k python compressed graph
A compact and highly-efficient representation of the graph dataset, suited for scale-up analysis on high-end machines with large amounts of memory. The graph is compressed in Boldi-Vigna representation, designed to be loaded by the WebGraph framework, specifically using our swh-graph library.
- Comments
-
The popular-3k-python teaser contains a subset of 2197 popular repositories tagged as being written in the Python language, from GitHub, GitLab.com, PyPI and Debian. The selection criteria to pick the software origins was the following:
- the 580 most popular GitHub projects written in Python (by number of stars),
- the 135 GitLab.com projects written in Python that have 2 stars or more,
- the 827 most popular PyPI projects (by usage statistics, according to the Top PyPI Packages database),
- the 655 most popular Debian packages with the debtag implemented-in::python (by "votes" according to the Debian Popularity Contest database)
- Dataset size
- 15 GB
- Export date
- Teaser of
- Compressed graph [2021-03-23]
- S3 URL
- s3://softwareheritage/graph/2021-03-23-popular-3k-python/compressed/
- Deprecated
- False
Download the dataset
For Amazon S3 links, you'll need to install either awscli or swh.datasets.
aws s3 cp --recursive --no-sign-request s3://softwareheritage/graph/2021-03-23-popular-3k-python/compressed/ 2021-03-23-popular-3k-python-compressed
# ORswh datasets download-graph 2021-03-23-popular-3k-python