I am using the requests library to scrape a website on an interval. I use selenium to login and get the requisite cookies, then use requests to hit the API directly. Everything works nicely for a few hours (30-50 requests) and then I invariably get this exception:
File "attempt_enroll.py", line 104, in <module>
resp = attemptEnroll(session)
File "attempt_enroll.py", line 86, in attemptEnroll
r = session.post(enroll_url, json=payload)
File "/lib/python2.6/site-packages/requests/sessions.py", line 559, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/lib/python2.6/site-packages/requests/sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "/lib/python2.6/site-packages/requests/sessions.py", line 662, in send
r.content
File "/lib/python2.6/site-packages/requests/models.py", line 827, in content
self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
File "/lib/python2.6/site-packages/requests/models.py", line 752, in generate
raise ChunkedEncodingError(e)
ChunkedEncodingError: ("Connection broken: error(9, 'Bad file descriptor')", error(9, 'Bad file descriptor'))
I thought there might be hanging sockets or file descriptors, but after an hour of running the process still only had 4 open fds. It's been difficult debugging because it happens so intermittently. It's my first time using the requests library.
Here is a condensed version of the code, I have left all of the requests calls in the same place:
payload = {
'some_stuff': True,
}
enroll_url = 'https://foo.ca'
expected = '''some string'''
#use selenium to login and (critically) run js on homepage to generate cookies
#then quit selenium and use the cookies to setup a requests session
def login(username, password):
<selenium code to login snipped>
#retrieve all the cookies and kill webdriver since we don't need it anymore
cookies = driver.get_cookies()
driver.quit()
s = requests.Session()
for cookie in cookies:
s.cookies.set(cookie['name'], cookie['value'])
if(cookie['name'] == 'XSRF-TOKEN'):
s.headers.update({
'X-XSRF-TOKEN': cookie['value'],
'Connection':'close',
})
return s
def attemptEnroll(session):
if(session is None):
return ""
r = session.post(enroll_url, json=payload)
return r.text
#number of failed attempts in a row
failed_count = 0
session = None
while True:
worked = False
errorMsg = "Unknown error"
try:
resp = attemptEnroll(session)
worked = (resp == expected)
errorMsg = resp
except Exception, e:
errorMsg = str(e) + traceback.format_exc()
if(worked):
failed_count = 0
#wait 2-7 minutes between requests
wait=randint(2*60,7*60)
sleep(wait)
else:
sleep(failed_count*60)
failed_count += 1
#stop after 3 failures in a row
if(failed_count >= 3):
break
#otherwise create a new login and try again
session = login("<snip>", "<snip>!")