42

How can I make a "keep alive" HTTP request using Python's urllib2?

Piotr Dobrogost
  • 41,292
  • 40
  • 236
  • 366
ibz
  • 44,461
  • 24
  • 70
  • 86

7 Answers7

34

Use the urlgrabber library. This includes an HTTP handler for urllib2 that supports HTTP 1.1 and keepalive:

>>> import urllib2
>>> from urlgrabber.keepalive import HTTPHandler
>>> keepalive_handler = HTTPHandler()
>>> opener = urllib2.build_opener(keepalive_handler)
>>> urllib2.install_opener(opener)
>>> 
>>> fo = urllib2.urlopen('http://www.python.org')

Note: you should use urlgrabber version 3.9.0 or earlier, as the keepalive module has been removed in version 3.9.1

There is a port of the keepalive module to Python 3.

msanders
  • 5,739
  • 1
  • 29
  • 30
  • In the second line, it seems it should be `from urlgrabber.keepalive import HTTPHandler` – btk Mar 08 '11 at 20:39
  • Thanks @btk - I've now corrected the code accordingly. I've also added a note re which version of urlgrabber to use as per @jwatt's answer. – msanders Mar 14 '11 at 12:21
  • 2
    [A quick port of it I made to python 3.](http://pastie.org/2388246) Hope if helps someone. – bgw Aug 17 '11 at 22:34
  • 1
    Thanks @PiPeep - I've added a link to your port in my answer. – msanders Sep 19 '11 at 15:44
  • 1
    This library has some issues with headers and lacks cookie support. You can fix it by copying from urllib2 and httplib but I'd recommend trying another library. – 2371 Nov 25 '11 at 06:22
  • @bgw Does the Python 3 port also support python 2? – speedplane Jul 09 '16 at 21:42
  • 1
    @speedplane, I believe it does, however, instead of using that pastie link, you should use https://github.com/wikier/keepalive, which is more actively maintained. I've updated the post. The edit should be visible after it's peer reviewed. – bgw Sep 23 '16 at 05:37
13

Try urllib3 which has the following features:

  • Re-use the same socket connection for multiple requests (HTTPConnectionPool and HTTPSConnectionPool) (with optional client-side certificate verification).
  • File posting (encode_multipart_formdata).
  • Built-in redirection and retries (optional).
  • Supports gzip and deflate decoding.
  • Thread-safe and sanity-safe.
  • Small and easy to understand codebase perfect for extending and building upon. For a more comprehensive solution, have a look at Requests.

or a much more comprehensive solution - Requests - which supports keep-alive from version 0.8.0 (by using urllib3 internally) and has the following features:

  • Extremely simple HEAD, GET, POST, PUT, PATCH, DELETE Requests.
  • Gevent support for Asyncronous Requests.
  • Sessions with cookie persistience.
  • Basic, Digest, and Custom Authentication support.
  • Automatic form-encoding of dictionaries
  • A simple dictionary interface for request/response cookies.
  • Multipart file uploads.
  • Automatc decoding of Unicode, gzip, and deflate responses.
  • Full support for unicode URLs and domain names.
Piotr Dobrogost
  • 41,292
  • 40
  • 236
  • 366
6

Or check out httplib's HTTPConnection.

Mark
  • 6,269
  • 2
  • 35
  • 34
  • 3
    how to enable keep-alive for HTTPConnection? I tried adding `Connection: Keep-Alive` to both requests and response headers, but `httplib` still reconnects on each request – Andriy Tylychko Oct 06 '11 at 12:55
5

Unfortunately keepalive.py was removed from urlgrabber on 25 Sep 2009 by the following change after urlgrabber was changed to depend on pycurl (which supports keep-alive):

http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=commit;h=f964aa8bdc52b29a2c137a917c72eecd4c4dda94

However, you can still get the last revision of keepalive.py here:

http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=blob_plain;f=urlgrabber/keepalive.py;hb=a531cb19eb162ad7e0b62039d19259341f37f3a6

jwatt
  • 860
  • 10
  • 8
4

Note that urlgrabber does not entirely work with python 2.6. I fixed the issues (I think) by making the following modifications in keepalive.py.

In keepalive.HTTPHandler.do_open() remove this

     if r.status == 200 or not HANDLE_ERRORS:
         return r

And insert this

     if r.status == 200 or not HANDLE_ERRORS:
         # [speedplane] Must return an adinfourl object
         resp = urllib2.addinfourl(r, r.msg, req.get_full_url())
         resp.code = r.status
         resp.msg = r.reason
         return resp
Dzinx
  • 55,586
  • 10
  • 60
  • 78
speedplane
  • 15,673
  • 16
  • 86
  • 138
  • Thanks but it would be nice if you explained what this fixed instead of that useless tagged comment. – 2371 Nov 25 '11 at 03:17
  • The original r and your resp are both and both have the same attributes. addinfourl says "class to add info() and geturl() methods to an open file." but the original already has info() and geturl(). Couldn't work out the benefit. – 2371 Nov 25 '11 at 03:31
3

Please avoid collective pain and use Requests instead. It will do the right thing by default and use keep-alive if applicable.

Prof. Falken
  • 24,226
  • 19
  • 100
  • 173
0

Here's a somewhat similar urlopen() that does keep-alive, though it's not threadsafe.

try:
    from http.client import HTTPConnection, HTTPSConnection
except ImportError:
    from httplib import HTTPConnection, HTTPSConnection
import select
connections = {}


def request(method, url, body=None, headers={}, **kwargs):
    scheme, _, host, path = url.split('/', 3)
    h = connections.get((scheme, host))
    if h and select.select([h.sock], [], [], 0)[0]:
        h.close()
        h = None
    if not h:
        Connection = HTTPConnection if scheme == 'http:' else HTTPSConnection
        h = connections[(scheme, host)] = Connection(host, **kwargs)
    h.request(method, '/' + path, body, headers)
    return h.getresponse()


def urlopen(url, data=None, *args, **kwargs):
    resp = request('POST' if data else 'GET', url, data, *args, **kwargs)
    assert resp.status < 400, (resp.status, resp.reason, resp.read())
    return resp
Collin Anderson
  • 14,787
  • 6
  • 68
  • 57