The popular-3k-python teaser contains a subset of 2197 popular repositories tagged as being written in the Python language, from GitHub, GitLab.com, PyPI and Debian. The selection criteria to pick the software origins was the following:
the 580 most popular GitHub projects written in Python (by number of stars),
the 135 GitLab.com projects written in Python that have 2 stars or more,
the 827 most popular PyPI projects (by usage statistics, according to the Top PyPI Packages database),
the 655 most popular Debian packages with the debtag implemented-in::python (by "votes" according to the Debian Popularity Contest database)
The popular-3k-python teaser contains a subset of 2197 popular repositories tagged as being written in the Python language, from GitHub, GitLab.com, PyPI and Debian. The selection criteria to pick the software origins was the following:
the 580 most popular GitHub projects written in Python (by number of stars),
the 135 GitLab.com projects written in Python that have 2 stars or more,
the 827 most popular PyPI projects (by usage statistics, according to the Top PyPI Packages database),
the 655 most popular Debian packages with the debtag implemented-in::python (by "votes" according to the Debian Popularity Contest database)
If you use these datasets for research purposes, please cite the following paper:
Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli.
The Software Heritage Graph Dataset: Public software development under one roof.
In proceedings of MSR 2019: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Co-located with ICSE 2019.
preprint, bibtex