I am making several http requests to a particular host using python's urllib2 library. Each time a request is made a new tcp and http connection is created which takes a noticeable amount of time. Is there any way to keep the tcp/http connection alive using urllib2?
Asked
Active
Viewed 1.4k times
3 Answers
27
If you switch to httplib, you will have finer control over the underlying connection.
For example:
import httplib
conn = httplib.HTTPConnection(url)
conn.request('GET', '/foo')
r1 = conn.getresponse()
r1.read()
conn.request('GET', '/bar')
r2 = conn.getresponse()
r2.read()
conn.close()
This would send 2 HTTP GETs on the same underlying TCP connection.

tuxayo
- 1,150
- 1
- 13
- 20

Corey Goldberg
- 59,062
- 28
- 129
- 143
-
That is a good answer since httplib is part of python. That saves us from having to install a third party module. Thx! – Antoine 'hashar' Musso Apr 14 '13 at 16:54
-
Maybe it will be for someone useful, also there is HTTPSConnection. – Peter May 12 '16 at 12:26
2
I've used the third-party urllib3
library to good effect in the past. It's designed to complement urllib2
by pooling connections for reuse.
Modified example from the wiki:
>>> from urllib3 import HTTPConnectionPool
>>> # Create a connection pool for a specific host
... http_pool = HTTPConnectionPool('www.google.com')
>>> # simple GET request, for example
... r = http_pool.urlopen('GET', '/')
>>> print r.status, len(r.data)
200 28050
>>> r = http_pool.urlopen('GET', '/search?q=hello+world')
>>> print r.status, len(r.data)
200 79124

Greg Haskins
- 6,714
- 2
- 27
- 22
0
If you need something more automatic than plain httplib, this might help, though it's not threadsafe.
try:
from http.client import HTTPConnection, HTTPSConnection
except ImportError:
from httplib import HTTPConnection, HTTPSConnection
import select
connections = {}
def request(method, url, body=None, headers={}, **kwargs):
scheme, _, host, path = url.split('/', 3)
h = connections.get((scheme, host))
if h and select.select([h.sock], [], [], 0)[0]:
h.close()
h = None
if not h:
Connection = HTTPConnection if scheme == 'http:' else HTTPSConnection
h = connections[(scheme, host)] = Connection(host, **kwargs)
h.request(method, '/' + path, body, headers)
return h.getresponse()
def urlopen(url, data=None, *args, **kwargs):
resp = request('POST' if data else 'GET', url, data, *args, **kwargs)
assert resp.status < 400, (resp.status, resp.reason, resp.read())
return resp

Collin Anderson
- 14,787
- 6
- 68
- 57