7

Overview

I am using urlopen from the Python 2.7.1 urllib2 package to do a HTTP POST form a Windows XP machine to a remote Apache webserver (for instance the built-in web sharing of Mac OS X). The sent data contains some identifier, data and a checksum, if all data is sent the server responds with an acknowledgement. The checksum in the data can be used to check if everything arrived in fine order.

The Problem

Usually this works great, however sometimes the internet connection is bad, often because the client sending the data uses a wifi or 3G connection. This results in internet connection loss for some arbitrary amount time. urlopen contains a timeout option, to make sure that this does not block your program and it can continue.

This is what I want, but the problem is that urlopen does not stop the socket from continuing to send whatever data it still had to send when the timeout occurred. I have tested this (with the code that I will show below) by trying to send a large bit of data to my laptop, I would see the network activity on both show activity, I'd then stop the wireless on the laptop, wait until the function times out, and then reactivate the wireless, and the data transfer would then continue, but the program will not be listening for responses anymore. I even tried to exit the Python interpreter and it would still send data, so control of that is handed over to Windows somehow.

Causes

The timeout (as I understand it) works like this: It checks for an 'idle response time'
( [Python-Dev] Adding socket timeout to urllib2 )
If you set the timeout to 3, it will open the connection, start a counter, then try to send the data and wait for a response, if at any point before receiving the response the timer runs out a timeout exception is called. Note that the sending of the data does not seem to count as 'activity' a far as the timeout timer is concerned.
( urllib2 times out but doesn't close socket connection )
( Close urllib2 connection )

Apparently it is somewhere stated that when a socket is closed/dereferenced/garbage collected it calls its 'close' function which waits for all data to be sent before closing the socket. However there is also a shutdown function, which should stop the socket immediately, preventing any more data to be sent.
( socket.shutdown vs socket.close )
( http://docs.python.org/library/socket.html#socket.socket.close )

What I Want

I want the connection to be 'shutdown' when a timeout occurs. Otherwise my client will not be able to tell if the data was received properly or not and it might try to send it again. I'd rather just kill the connection and try again later, knowing that the data was (probably) not send successfully (the server can recognize this if the checksum does not match).

Here is part of the code that I used to test this. The try..except parts do not yet work as I'd expect, any help there is also appreciated. As I said before I want the program to shutdown the socket as soon as the timeout (or any other) exception is raised.

from urllib import urlencode
from urllib2 import urlopen, HTTPError, URLError
import socket
import sys

class Uploader:
    def __init__(self):
        self.URL = "http://.../"
        self.data = urlencode({'fakerange':range(0,2000000,1)})
        print "Data Generated"

    def upload(self):
        try:
            f = urlopen(self.URL, self.data, timeout=10)
            returncode = f.read()
        except (URLError, HTTPError), msg:
            returncode = str(msg)
        except socket.error:
            returncode = "Socket Timeout!"
        else:
            returncode = 'Im here'

def main():
    upobj = Uploader()
    returncode = upobj.upload()

    if returncode == '100':
        print "Success!"
    else:
        print "Maybe a Fail"
        print returncode
    print "The End"

if __name__ == '__main__':
main()
Community
  • 1
  • 1
153957
  • 404
  • 6
  • 18
  • One idea is to create a timeout manager client-side, say like this: if I don't get back a success status back from the server within x seconds, I will shut down the existing connection and try again fresh later. – Hoff Nov 08 '11 at 17:49
  • @Hoff That is what the timeout in urlopen should do, or at least that is what I would expect from it. However it does not shut down the existing connection, it continues to run in the background, even if python is exited entirely. I do not know how (and if it is possible) to make a separate program that looks for these timed out connections and shuts them down, how do you reference them? The main problem I believe is that when the timeout exception occurs you can no longer access the socket of the urlopen to shut it down. – 153957 Nov 09 '11 at 09:28

5 Answers5

1

I found some code that might help you on this thread:

from urllib2 import urlopen
from threading import Timer
url = "http://www.python.org"
def handler(fh):
    fh.close()
    fh = urlopen(url)
    t = Timer(20.0, handler,[fh])
    t.start()
    data = fh.read()
    t.cancel()
Hoff
  • 38,776
  • 17
  • 74
  • 99
  • Thank you, that looks like in interesting idea. I can not test it at this moment but will try (Im fairly new to Python) to when I can. I still don't know if this will shutdown the socket, shouldn't I call something like fh.fp._sock.fp._sock.shutdown(socket.SHUT_RDWR) (as in [close-urllib2-connection]) instead of fh.close()? And maybe I don't understand but, doesn't the urlopen(url, data) call open the socket (the connection), then sent the data and then waits for a response? So I would have to wait for it to finish anyway before I start the Timer? Should the Timer not start before urlopen? – 153957 Nov 17 '11 at 11:54
1

You might consider using a different API than urllib2. httplib is a bit less pleasant, but often not too bad. It does, however, make it possible for you to access the underlying socket object. So, you could do something like:

import httplib
import socket

def upload(host, path, data):
    conn = httplib.HTTPConnection(host, 80, True, 3)
    try:
        conn.request('POST', path, data)
        response = conn.getresponse()
        if response.status != 200:
            # maybe an HTTP error                                                                                    
            return response.status
        else:
            response_data = r.read()
            return response_data
    except socket.error:
        return "Socket Timeout!"
    finally:
        conn.sock.shutdown()
        conn.close()

def main():
    data = urlencode({'fakerange':range(0,2000000,1)})
    returncode = upload("www.server.com", "/path/to/endpoint", data)

    ...

(Disclaimer: untested)

httplib does have various limitations when compared to urllib2 - it won't automatically handle things like redirects, for example. However, if you're using this to access a relatively fixed API rather than download random things from the internet, it should do the job fine.

Honestly, I probably wouldn't bother to do this myself though; I'm generally content to let the operating system deal with TCP buffers however it wants, even if its approach isn't always completely optimal...

akgood
  • 1,077
  • 8
  • 4
  • I tried using the httplib, using an except to so the shutdown (and close) in case of a timeout. During a test python ran the shutdown and close commands and exited my script, but in the background the upload was still continuing.. So the 'shutdown' seems less effective than the name would suggest. In the end we went with the suggestion you made in the last line. We removed the timeout all together and just let the OS/Apache take care of it, and that seems to work best. – 153957 May 01 '12 at 09:55
0

It turns out that calling the .sock.shutdown(socket.SHUT_RDWR) and .close() commands on a HTTPConnection that is uploading does not stop the upload. It will continue running in the background. I am not aware of more reliable/direct methods to kill the connection from Python, while using urllib2 or httplib.
In the end we tested the upload using urllib2 without the timeout. This means that on a slow connection it might take very long to do the upload (POST), but at least we will know wether it worked or not. There is a possibility that urlopen might hang because there is no timeout, but we have tested various bad-connection possibilities and in all cases the urlopen either worked or returned an error after some time.
This means that we will at least know, in the client side, that the upload succeeded or failed, and that it does not continue in the background.

153957
  • 404
  • 6
  • 18
0

If calling socket.shutdown really is the only way to cut off the data on timeout, I think you need to resort to some sort of monkey-patching. urllib2 doesn't really offer you the opportunity for that sort of fine-grained socket control.

Check out Source interface with Python and urllib2 for a good approach.

Community
  • 1
  • 1
Matt Luongo
  • 14,371
  • 6
  • 53
  • 64
  • I tried using the shutdown (with httplib), but the data kept flowing even after calling both .shutdown(SHUT_RDWR) and .close(). – 153957 May 01 '12 at 10:02
0

You could spawn a secondary thread using multiprocessing, then shut it down whenever you detect a timeout (URLError exception with message "urlopen error timed out").

Stopping the process should be enough to close the socket.

Giacomo Lacava
  • 1,784
  • 13
  • 25
  • But stopping which process? I tried to exit the Python interpreter and it would still continue to send data, so control of that is handed over to some Windows buffer. – 153957 Mar 29 '12 at 12:25