1

I have Python application which uses threading and requests modules for processing many pages. Basic function for page downloading looks like this:

def get_page(url):
    error = None
    data = None
    max_page_size = 10 * 1024 * 1024

    try:
        s = requests.Session()
        s.max_redirects = 10
        s.keep_alive = False

        r = s.get('http://%s' % url if not url.startswith('http://') else url,
                  headers=headers, timeout=10.0, stream=True)
        raw_data = io.BytesIO()
        size = 0
        for chunk in r.iter_content(4096):
            size += len(chunk)
            raw_data.write(chunk)
            if size > max_page_size:
                r.close()
                raise SpyderError('too_large')
        fetch_result = 'ok'
    finally:
        del s

It works well in most cases but sometimes application freezes because of very slow connection with some servers or some other network problems. How can I setup a global guaranteed timeout for whole function? Should I use asyncio or coroutines?

rici
  • 234,347
  • 28
  • 237
  • 341
pensnarik
  • 1,214
  • 2
  • 12
  • 15

0 Answers0