I have Python application which uses threading and requests modules for processing many pages. Basic function for page downloading looks like this:
def get_page(url):
error = None
data = None
max_page_size = 10 * 1024 * 1024
try:
s = requests.Session()
s.max_redirects = 10
s.keep_alive = False
r = s.get('http://%s' % url if not url.startswith('http://') else url,
headers=headers, timeout=10.0, stream=True)
raw_data = io.BytesIO()
size = 0
for chunk in r.iter_content(4096):
size += len(chunk)
raw_data.write(chunk)
if size > max_page_size:
r.close()
raise SpyderError('too_large')
fetch_result = 'ok'
finally:
del s
It works well in most cases but sometimes application freezes because of very slow connection with some servers or some other network problems. How can I setup a global guaranteed timeout for whole function? Should I use asyncio or coroutines?