0

I am using the requests library to scrape a website on an interval. I use selenium to login and get the requisite cookies, then use requests to hit the API directly. Everything works nicely for a few hours (30-50 requests) and then I invariably get this exception:

  File "attempt_enroll.py", line 104, in <module>
    resp = attemptEnroll(session)
  File "attempt_enroll.py", line 86, in attemptEnroll
    r = session.post(enroll_url, json=payload)
  File "/lib/python2.6/site-packages/requests/sessions.py", line 559, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/lib/python2.6/site-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/lib/python2.6/site-packages/requests/sessions.py", line 662, in send
    r.content
  File "/lib/python2.6/site-packages/requests/models.py", line 827, in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
  File "/lib/python2.6/site-packages/requests/models.py", line 752, in generate
    raise ChunkedEncodingError(e)
ChunkedEncodingError: ("Connection broken: error(9, 'Bad file descriptor')", error(9, 'Bad file descriptor'))

I thought there might be hanging sockets or file descriptors, but after an hour of running the process still only had 4 open fds. It's been difficult debugging because it happens so intermittently. It's my first time using the requests library.

Here is a condensed version of the code, I have left all of the requests calls in the same place:

payload = {
    'some_stuff': True,
}
enroll_url = 'https://foo.ca'
expected = '''some string'''

#use selenium to login and (critically) run js on homepage to generate cookies
#then quit selenium and use the cookies to setup a requests session
def login(username, password):
    <selenium code to login snipped>
    #retrieve all the cookies and kill webdriver since we don't need it anymore
    cookies = driver.get_cookies()
    driver.quit()

    s = requests.Session()

    for cookie in cookies:
        s.cookies.set(cookie['name'], cookie['value'])
        if(cookie['name'] == 'XSRF-TOKEN'):
            s.headers.update({
                'X-XSRF-TOKEN': cookie['value'],
                'Connection':'close',
            })
    return s

def attemptEnroll(session):
    if(session is None):
        return ""
    r = session.post(enroll_url, json=payload)
    return r.text

#number of failed attempts in a row
failed_count = 0
session = None
while True:
    worked = False
    errorMsg = "Unknown error"
    try:
        resp = attemptEnroll(session)
        worked = (resp == expected)
        errorMsg = resp
    except Exception, e:
        errorMsg = str(e) + traceback.format_exc()
    if(worked):
        failed_count = 0
        #wait 2-7 minutes between requests
        wait=randint(2*60,7*60)
        sleep(wait)        
    else:
        sleep(failed_count*60)            
        failed_count += 1
        #stop after 3 failures in a row
        if(failed_count >= 3):
            break       
        #otherwise create a new login and try again
        session = login("<snip>", "<snip>!")
charliehorse55
  • 1,940
  • 5
  • 24
  • 38
  • There are various reason for invalid file descriptor. Do have a look at this post (https://stackoverflow.com/questions/16511337/correct-way-to-try-except-using-python-requests-module). Add exception(s) to handle your particular situation. – yoonghm Sep 19 '18 at 01:08
  • The exception I'm getting isn't mentioned anywhere in those docs. It's also pertinent that I automatically try creating a new session and sending another request with a small delay, only to have it fail again with the same error. Restarting the application solves the issue. – charliehorse55 Sep 19 '18 at 01:20
  • In that case, I suspect there are quick memory leaks that are not reclaimed back. Try to use external program to monitor memory and cpu usages of your program. Alternatively, shutdown your program regularly, and of course automatically. – yoonghm Sep 19 '18 at 01:23

0 Answers0