13

I'm trying to get the all contributors of a repo using this github api.

If I'm not wrong,it also tells me, if there are more than 500 contributors for a repo, it only gives 500 of them and rest are marked as anonymous.

For performance reasons, only the first 500 author email addresses in the repository will be linked to GitHub users.

This repo linux kernel has 5k+ contributors, as per the api i should get at least 500 contributors through the api.

When i do curl -I https://api.github.com/repos/torvalds/linux/contributors?per_page=100

I get only 3 pages (per_page = 100) so i get >300 contributors.(look at "link" header)

Is there a way to get all the contributors of the repo ( 5000+ )?

HTTP/1.1 200 OK
Server: GitHub.com
Date: Thu, 19 Nov 2015 18:00:54 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 100308
Status: 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 56
X-RateLimit-Reset: 1447958881
Cache-Control: public, max-age=60, s-maxage=60
Last-Modified: Thu, 19 Nov 2015 16:06:38 GMT
ETag: "a57e0f74fc68e1791da15d33fa044616"
Vary: Accept
X-GitHub-Media-Type: github.v3
Link: <https://api.github.com/repositories/2325298/contributors?per_page=100&page=2>; rel="next", <https://api.github.com/repositories/2325298/contributors?per_page=100&page=3>; rel="last"
X-XSS-Protection: 1; mode=block
X-Frame-Options: deny
Content-Security-Policy: default-src 'none'
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: ETag, Link, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval
Access-Control-Allow-Origin: *
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
X-Content-Type-Options: nosniff
Vary: Accept-Encoding
X-Served-By: a30e6f9aa7cf5731b87dfb3b9992202d
X-GitHub-Request-Id: 67E881D2:146C9:24CF1BB3:564E0E55
ekostadinov
  • 6,880
  • 3
  • 29
  • 47
simplyblue
  • 2,279
  • 8
  • 41
  • 67

1 Answers1

4

Since the GitHub API doesn't seem to support this, another approach (a much much slower approach) would be to clone the repo and then run this command (to get names):

git log --all --format='%aN' | sort -u

To get results by email address (which should guard against contributor name config changes and will be more accurate):

git log --all --format='%aE' | sort -u

If you needed this functionality for any repo you could write a simple script that would take in the repository path, clone the repo, run the command, and then delete the downloaded repo.

In the meantime, you could contact GitHub in hopes they increase the priority in expanding/fixing their API.

Jonathan.Brink
  • 23,757
  • 20
  • 73
  • 115
  • This is infinity slower than using the GitHub API. There is a much better way by iterating through the contributors list. – Whitecat Dec 09 '16 at 01:08
  • @Whitecat But the Github API have not all information, then this is the current unique way to do that. – deFreitas Dec 02 '17 at 01:34