Python urllib2 with keep alive

Question

How can I make a "keep alive" HTTP request using Python's urllib2?

msanders · Accepted Answer · 2018-03-19T17:56:38.347

34

Use the urlgrabber library. This includes an HTTP handler for urllib2 that supports HTTP 1.1 and keepalive:

>>> import urllib2
>>> from urlgrabber.keepalive import HTTPHandler
>>> keepalive_handler = HTTPHandler()
>>> opener = urllib2.build_opener(keepalive_handler)
>>> urllib2.install_opener(opener)
>>> 
>>> fo = urllib2.urlopen('http://www.python.org')

Note: you should use urlgrabber version 3.9.0 or earlier, as the keepalive module has been removed in version 3.9.1

There is a port of the keepalive module to Python 3.

edited Mar 19 '18 at 17:56

answered Jun 24 '09 at 09:56

msanders

5,739
1
29
30

In the second line, it seems it should be `from urlgrabber.keepalive import HTTPHandler` – btk Mar 08 '11 at 20:39
Thanks @btk - I've now corrected the code accordingly. I've also added a note re which version of urlgrabber to use as per @jwatt's answer. – msanders Mar 14 '11 at 12:21
2

[A quick port of it I made to python 3.](http://pastie.org/2388246) Hope if helps someone. – bgw Aug 17 '11 at 22:34
1

Thanks @PiPeep - I've added a link to your port in my answer. – msanders Sep 19 '11 at 15:44
1

This library has some issues with headers and lacks cookie support. You can fix it by copying from urllib2 and httplib but I'd recommend trying another library. – 2371 Nov 25 '11 at 06:22
@bgw Does the Python 3 port also support python 2? – speedplane Jul 09 '16 at 21:42
1

@speedplane, I believe it does, however, instead of using that pastie link, you should use https://github.com/wikier/keepalive, which is more actively maintained. I've updated the post. The edit should be visible after it's peer reviewed. – bgw Sep 23 '16 at 05:37

Piotr Dobrogost · Answer 2 · 2011-11-11T11:32:31.233

Try urllib3 which has the following features:

Re-use the same socket connection for multiple requests (HTTPConnectionPool and HTTPSConnectionPool) (with optional client-side certificate verification).
File posting (encode_multipart_formdata).
Built-in redirection and retries (optional).
Supports gzip and deflate decoding.
Thread-safe and sanity-safe.
Small and easy to understand codebase perfect for extending and building upon. For a more comprehensive solution, have a look at Requests.

or a much more comprehensive solution - Requests - which supports keep-alive from version 0.8.0 (by using urllib3 internally) and has the following features:

Extremely simple HEAD, GET, POST, PUT, PATCH, DELETE Requests.
Gevent support for Asyncronous Requests.
Sessions with cookie persistience.
Basic, Digest, and Custom Authentication support.
Automatic form-encoding of dictionaries
A simple dictionary interface for request/response cookies.
Multipart file uploads.
Automatc decoding of Unicode, gzip, and deflate responses.
Full support for unicode URLs and domain names.

Mark · Answer 3 · 2011-05-28T17:18:21.150

6

Or check out httplib's HTTPConnection.

edited May 28 '11 at 17:18

answered May 28 '11 at 16:54

Mark

6,269
2
35
34

3

how to enable keep-alive for HTTPConnection? I tried adding `Connection: Keep-Alive` to both requests and response headers, but `httplib` still reconnects on each request – Andriy Tylychko Oct 06 '11 at 12:55

jwatt · Answer 4 · 2011-01-06T15:15:10.927

Unfortunately keepalive.py was removed from urlgrabber on 25 Sep 2009 by the following change after urlgrabber was changed to depend on pycurl (which supports keep-alive):

http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=commit;h=f964aa8bdc52b29a2c137a917c72eecd4c4dda94

However, you can still get the last revision of keepalive.py here:

http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=blob_plain;f=urlgrabber/keepalive.py;hb=a531cb19eb162ad7e0b62039d19259341f37f3a6

score 4 · Answer 5 · edited Dec 07 '11 at 10:10

4

Note that urlgrabber does not entirely work with python 2.6. I fixed the issues (I think) by making the following modifications in keepalive.py.

In keepalive.HTTPHandler.do_open() remove this

     if r.status == 200 or not HANDLE_ERRORS:
         return r

And insert this

     if r.status == 200 or not HANDLE_ERRORS:
         # [speedplane] Must return an adinfourl object
         resp = urllib2.addinfourl(r, r.msg, req.get_full_url())
         resp.code = r.status
         resp.msg = r.reason
         return resp

edited Dec 07 '11 at 10:10

Dzinx

55,586
10
60
78

answered Jan 08 '10 at 00:05

speedplane

15,673
16
86
138

Thanks but it would be nice if you explained what this fixed instead of that useless tagged comment. – 2371 Nov 25 '11 at 03:17
The original r and your resp are both and both have the same attributes. addinfourl says "class to add info() and geturl() methods to an open file." but the original already has info() and geturl(). Couldn't work out the benefit. – 2371 Nov 25 '11 at 03:31

Prof. Falken · Answer 6 · 2013-06-12T13:59:38.993

3

Please avoid collective pain and use Requests instead. It will do the right thing by default and use keep-alive if applicable.

edited Jun 12 '13 at 13:59

answered Jan 11 '13 at 16:39

Prof. Falken

24,226
19
100
173

I am working on some NTLM authentication and the Requests NTLM library doesn't work correctly for it. However, the urllib2 NTLM library does work correctly. This question was therefore helpful to me. – Jefferson Hudson May 31 '14 at 15:55
@JeffersonHudson, I was not aware of that. You might have better luck with https://github.com/requests/requests-ntlm – Prof. Falken Jun 01 '14 at 11:24
1

I have already proposed Requests in my answer posted over a year before this one... – Piotr Dobrogost Oct 23 '14 at 16:16
@PiotrDobrogost, fair enough, but what I propose is, let Requests be the default choice. – Prof. Falken Oct 23 '14 at 17:41

score 0 · Answer 7 · answered Nov 23 '14 at 15:06

Here's a somewhat similar urlopen() that does keep-alive, though it's not threadsafe.

try:
    from http.client import HTTPConnection, HTTPSConnection
except ImportError:
    from httplib import HTTPConnection, HTTPSConnection
import select
connections = {}


def request(method, url, body=None, headers={}, **kwargs):
    scheme, _, host, path = url.split('/', 3)
    h = connections.get((scheme, host))
    if h and select.select([h.sock], [], [], 0)[0]:
        h.close()
        h = None
    if not h:
        Connection = HTTPConnection if scheme == 'http:' else HTTPSConnection
        h = connections[(scheme, host)] = Connection(host, **kwargs)
    h.request(method, '/' + path, body, headers)
    return h.getresponse()


def urlopen(url, data=None, *args, **kwargs):
    resp = request('POST' if data else 'GET', url, data, *args, **kwargs)
    assert resp.status < 400, (resp.status, resp.reason, resp.read())
    return resp

Python urllib2 with keep alive

7 Answers7

Linked

Related