20

I'm looking for information on thread safety of urllib2 and httplib. The official documentation (http://docs.python.org/library/urllib2.html and http://docs.python.org/library/httplib.html) lacks any information on this subject; the word thread is not even mentioned there...

UPDATE

Ok, they are not thread-safe out of the box. What's required to make them thread-safe or is there a scenario in which they can be thread-safe? I'm asking because it's seems that

  • using separate OpenerDirector in each thread
  • not sharing HTTP connection among threads

would suffice to safely use these libs in threads. Similar usage scenario was proposed in question urllib2 and cookielib thread safety

Yuri
  • 4,254
  • 1
  • 29
  • 46
Piotr Dobrogost
  • 41,292
  • 40
  • 236
  • 366

1 Answers1

42

httplib and urllib2 are not thread-safe.

urllib2 does not provide serialized access to a global (shared) OpenerDirector object, which is used by urllib2.urlopen().

Similarly, httplib does not provide serialized access to HTTPConnection objects (i.e. by using a thread-safe connection pool), so sharing HTTPConnection objects between threads is not safe.

I suggest using httplib2 or urllib3 as an alternative if thread-safety is required.

Generally, if a module's documentation does not mention thread-safety, I would assume it is not thread-safe. You can look at the module's source code for verification.

When browsing the source code to determine whether a module is thread-safe, you can start by looking for uses of thread synchronization primitives from the threading or multiprocessing modules, or use of queue.Queue.

UPDATE

Here is a relevant source code snippet from urllib2.py (Python 2.7.2):

_opener = None
def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
    global _opener
    if _opener is None:
        _opener = build_opener()
    return _opener.open(url, data, timeout)

def install_opener(opener):
    global _opener
    _opener = opener

There is an obvious race condition when concurrent threads call install_opener() and urlopen().

Also, note that calling urlopen() with a Request object as the url parameter may mutate the Request object (see the source for OpenerDirector.open()), so it is not safe to concurrently call urlopen() with a shared Request object.

All told, urlopen() is thread-safe if the following conditions are met:

  • install_opener() is not called from another thread.
  • A non-shared Request object, or string is used as the url parameter.
Yuri
  • 4,254
  • 1
  • 29
  • 46
Gregg
  • 3,236
  • 20
  • 15
  • @Gregg - Could you say something about how you would determine thread-safety by inspecting a particular module's code? I often have this exact question... – ire_and_curses Apr 28 '11 at 22:23
  • 2
    @ire_and_curses: I've expanded my answer accordingly. – Gregg Apr 29 '11 at 01:42
  • 3
    The idea of forcing users to inspect library's source code to find out if given library is thread-safe looks strange to me. There are libraries using synchronization code and not being thread-safe (cookielib) and there are libraries not using synchronization code witch are thread-safe because they utilize lock-free structures and algorithms. – Piotr Dobrogost Apr 29 '11 at 18:53
  • 6
    @Piotr Dobrogost: I agree that users should not be *forced* to inspect a library's source code to find out if it is thread-safe. If a library is developed with thread-safety in mind, then I assume the docs will indicate this. If the docs do not talk about thread-safety, then I assume the library is not thread safe. To verify my assumption, a peek at the library's code is often necessary. Regarding lock-free data structures and cookielib, thread-safety is a complicated topic and I only provided a baseline of things to look for within a module that *may* indicate it is thread-safe. – Gregg Apr 29 '11 at 22:17
  • Is this actually true? From the code, there is a shared OpenDirector object, but HTTP requests will be handled by the HTTPHandler which doesn't have any meaningful state. So each open() call will ultimately result in a new HTTPConnection object (line 1116, urllib2.py). So at that point, it doesn't matter if the HTTPConnection is object is thread-safe since there will be a different instance of it per call to urllib2.open. This seems to support that: http://mail.python.org/pipermail/python-list/2005-January/916884.html. It also looks like from that they document when things aren't thread-safe. – Pete Aykroyd May 19 '11 at 23:26
  • @PeteAykroyd Referencing source code by line number without giving version is useless. For instance line [1116](http://hg.python.org/cpython/file/8527427914a2/Lib/urllib2.py#l1116) of `urllib2.py` in Python 2.7.2 is a blank line... – Piotr Dobrogost Dec 11 '11 at 20:06
  • @PiotrDobrogost Good point. So looking now at Python 2.7.2, urllib2.py, I've just quickly re-traced the code but it seems like if you are opening an HTTP connection, you end up calling http_open (line 1199) which calls do_open with the request object and the the class httplib.HTTPConnection. In the do_open function, it will then create a new HTTPConnection and use that for the request. It seems like this answer assumes that each HTTP request shares the same HTTPConnection and it seems to me that this is not the case. – Pete Aykroyd Dec 14 '11 at 19:20
  • Could you provide a single example when `urllib2.urlopen(url)` is not safe to call from several threads? – jfs Feb 10 '12 at 23:40
  • @Gregg: Can't upvote you twice. I've added the link to the `OpenerDirector.open()` source code. – jfs Mar 01 '12 at 23:14
  • If you do not use `urllib2.urlopen`, but instead use `OpenerDirector.open()` *AND* you do not share request objects, then this should be thread safe. – speedplane Apr 26 '16 at 18:10
  • urllib3 documents that it is thread-safe, but httplib2 does not, as far as I can tell. – Flimm Nov 08 '16 at 16:39