1

I'm implementing a little service that fetches web pages from various servers. I need to be able to configure different types of timeouts. I've tried mucking around with the settimeout method of sockets but it's not exactly as I'd like it. Here are the problems.

  1. I need to specify a timeout for the initial DNS lookup. I understand this is done when I instantiate the HTTPConnection at the beginning.

  2. My code is written in such a way that I first .read a chunk of data (around 10 MB) and if the entire payload fits in this, I move on to other parts of the code. If it doesn't fit in this, I directly stream the payload out to a file rather than into memory. When this happens, I do an unbounded .read() to get the data and if the remote side sends me, say, a byte of data every second, the connection just keeps waiting receiving one byte every second. I want to be able to disconnect with a "you're taking too long". A thread based solution would be the last resort.

Noufal Ibrahim
  • 71,383
  • 13
  • 135
  • 169

2 Answers2

1

httplib is to straight forward for what you are looking for.

I would recommend to take a look for http://pycurl.sourceforge.net/ and the http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTTIMEOUT option.

The http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPT_NOSIGNAL option sounds also interesting:

Consider building libcurl with c-ares support to enable asynchronous DNS lookups, which enables nice timeouts for name resolves without signals.

Raphael Bossek
  • 1,904
  • 14
  • 25
  • That would work. It's my last resort though since I'd like to avoid external dependencies (especially C ones) as much as possible. – Noufal Ibrahim Apr 26 '12 at 07:39
  • 1
    @NoufalIbrahim: [I've tried `pycurl.TIMEOUT` and it does work](http://stackoverflow.com/a/32685765/4279) – jfs Sep 21 '15 at 01:00
1

Have you tried requests?

You can set timeouts conveniently http://docs.python-requests.org/en/latest/user/quickstart/#timeouts

>>> requests.get('http://github.com', timeout=0.001)

EDIT: I missed the part 2 of the question. For that you could use this:

import sys
import signal
import requests

class TimeoutException(Exception): 
    pass 

def get_timeout(url, dns_timeout=10, load_timeout=60):
    def timeout_handler(signum, frame):
        raise TimeoutException()

    signal.signal(signal.SIGALRM, timeout_handler) 
    signal.alarm(load_timeout)  # triger alarm in seconds

    try: 
        response = requests.get(url, timeout=dns_timeout)
    except TimeoutException:
        return "you're taking too long"
    return response

and in your code use the get_timeout function.

If you need the timeout to be available for other functions you could create a decorator. Above code from http://pguides.net/python-tutorial/python-timeout-a-function/.

satran
  • 1,222
  • 4
  • 16
  • 27
  • It says quite clearly (just after your link) that the timeout is only for the connection process and not to get the payload. That's the problem I have. Also, this doesn't solve the question of DNS lookups. – Noufal Ibrahim May 01 '12 at 12:32
  • I'm sorry I overlooked your question. You are correct and I have edited the answer to try solve the 2 part of the question too. – satran May 01 '12 at 15:21
  • This might not work when there are multiple threads for this process. Signal based timeouts are risky in that scenario. – Noufal Ibrahim May 01 '12 at 16:08
  • @NoufalIbrahim: `requests`' `timeout` option limits only individual `socket` operations (the recent versions allow to specify the *connection* timeout separately from the *read* timeout and have the explicit notice that these timeouts are not total i.e., both connection and reading stage may involve multiple socket operations and therefore the total timeouts may be larger). – jfs Sep 21 '15 at 01:03