7

I'm scraping data off of Github via PyGithub. My issue is I receive this error during my scraping:

github.GithubException.GithubException: 403 {'documentation_url': 'https://developer.github.com/v3/#rate-limiting', 'message': 'API rate limit exceeded for XXXXX.'}

Upon curling the api I receive:

curl -i https://api.github.com/users/XXXXXX
HTTP/1.1 200 OK
Server: GitHub.com
Date: Thu, 14 Jul 2016 15:03:51 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 1301
Status: 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 52
X-RateLimit-Reset: 1468509718
Cache-Control: public, max-age=60, s-maxage=60
Vary: Accept
Last-Modified: Wed, 08 Jun 2016 13:29:08 GMT

note the Ratelimit labels:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 52
X-RateLimit-Reset: 1468509718

If I run my Python program again I will get another API rate limit exceeded message. I read the API documentation for github and as far as I can tell - I still have 52 requests left over. If I can provide anymore information to make this better let me know. Thank you.

Edit: To clarify I am using credentials to login into github.

ORGANIZATION = "ORG"
PERSONAL_ACCESS_TOKEN = "TOKEN"
g = Github(PERSONAL_ACCESS_TOKEN, per_page = 100)
github_organization = g.get_organization(ORGANIZATION)
ChillMurray
  • 81
  • 1
  • 5

2 Answers2

2

i had solved this problem with my previous work...here it is..

The 403 HTTP Status denotes a forbidden request, thus you have provided credentials that can't let you access some endpoints.

So you may need to provide a valid credentials (username / password) when creating the Github object:

#!/usr/bin/env python3
from github import Github

ACCESS_USERNAME = 'username'
ACCESS_PWD = "password"
client = Github(ACCESS_USERNAME, ACCESS_PWD, per_page=100)
user = client.get_user('ELLIOTTCABLE')
repo_list = [repo.name for repo in user.get_repos() if not repo.fork]
print(repo_list)

for j in repo_list:
    repo = user.get_repo(j)
    lang = repo.language
    print(j,':',lang)

Hope You'll Find it Useful.

Farhan Ansari
  • 279
  • 2
  • 14
  • Hey Farhan. Thank you for the response, I appreciate it. However, I am providing credentials - check out my edit. I think your forbidden request is something i didn't think about. My only concern with that is shouldn't it specify within the message which type of forbidden request I'm making? In my 403 it specifies 'message': 'API rate limit exceeded for XXXXX.' – ChillMurray Jul 14 '16 at 15:51
1

So the issue wasn't with my rate limit rather it was with the message the PyGithub wrapper was returning. I traced my error back and found this class in the source code : https://github.com/PyGithub/PyGithub/blob/master/github/Requester.py

Upon peaking into the __createException function I noticed this :

def __createException(self, status, headers, output):
    if status == 401 and output.get("message") == "Bad credentials":
        cls = GithubException.BadCredentialsException
    elif status == 401 and 'x-github-otp' in headers and re.match(r'.*required.*', headers['x-github-otp']):
        cls = GithubException.TwoFactorException  # pragma no cover (Should be covered)
    elif status == 403 and output.get("message").startswith("Missing or invalid User Agent string"):
        cls = GithubException.BadUserAgentException
    elif status == 403 and output.get("message").startswith("API Rate Limit Exceeded"):
        cls = GithubException.RateLimitExceededException
    elif status == 404 and output.get("message") == "Not Found":
        cls = GithubException.UnknownObjectException
    else:
        cls = GithubException.GithubException
    return cls(status, output)

Looking at the message of the exception I received I assumed it was the RateLimitExceededException.

However, looking at the actual exception itself, I noticed it was the GithubException.GithubException which looks to be a blanket exception if none of the other exceptions are triggered.

This answers my questions because it wasn't an API rate exceeded issue because I still had more requests left when i received this exception.

It's a non specific exception unfortunately. This answers my initial question for now.

Update: I was also curling the API without a token so it was not relaying me the correct info. With the token it shows that i did use up all my requests.

ChillMurray
  • 81
  • 1
  • 5
  • 1
    According to your `Update` at the very end, the problem was that you were not observing the right data because your `GET rate-limit` call wasn't authenticated? Thus the real problem was indeed that you had reached the limit? – payne Feb 26 '21 at 22:09