1

I have an API manager that connects to an URL and grabs some json. Very simple. Cut from the method:

req = Request(url)
socket.setdefaulttimeout(timeout)
resp = urlopen(req, None, timeout)
data = resp.read()
resp.close()

It works fine most of the time, but at random intervals it takes 5 s to complete the request. Even when timeout is set to 0.5 or 1.0 or whatever. I have logged it very closely so I am 100% sure that the line that takes time is number #3 (ie. resp = urlopen(req, None, timeout)).

Ive tried all solutions Ive found on the topic of timeout decorators and Timers etc. (To list some of them: Python urllib2.urlopen freezes script infinitely even though timeout is set, How can I force urllib2 to time out?, Timing out urllib2 urlopen operation in Python 2.4, Timeout function if it takes too long to finish )

But nothing works. My impression is that the thread freezes while urlopen does something and when its done it unfreezes and then all the timers and timeouts returns w timeout errors. but the execution time is still more then 5s.

I've found this old mailing list regarding urllib2 and handling of chunked encoding. So if the problem is still present then the solution might be to write a custom urlopen based on httplib.HTTP and not httplib.HTTPConnection. Another possible solution is to try some multithreading magic....

Both solutions seem to aggresive. And it bugs me that the timeout does not work all the way.

It is very important that the execution time of the script does not exceed 0.5s. Anyone that knows why I am experiencing the freezes or maybe a way to help me?

Update based on accepted answer: I changed the approach and use curl instead. Together w unix timeout it works just as I want. Example code follows:

t_timeout = str(API_TIMEOUT_TIME)
c_timeout = str(CURL_TIMEOUT_TIME)
cmd = ['timeout', t_timeout, 'curl', '--max-time', c_timeout, url]
prc = Popen(cmd, stdout=PIPE, stderr=PIPE)
response = prc.communicate()

Since curl only accepts int as timeout I added timeout. timeout accepts floats.

Community
  • 1
  • 1

1 Answers1

1

Looking through the source code, the timeout value is actually the maximum amount of time that Python will wait between receiving packets from the remote host.

So if you set the timeout to two seconds, and the remote host sends 60 packets at the rate of one packet per second, the timeout will never occur, although the overall process will still take 60 seconds.

Since the urlopen() function doesn't return until the remote host has finished sending all the HTTP headers, then if it sends the headers very slowly, there's not much you can do about it.

If you need an overall time limit, you'll probably have to implement your own HTTP client with non-blocking I/O.

Aya
  • 39,884
  • 6
  • 55
  • 55
  • Perfect answer. Thank you. Even tho I feel that there should be a overall timeout available. Ill update if I implement a custom HTTP-client as suggested. – user2520443 Jun 25 '13 at 16:16