15

I am using urllib2 for loading web-page, my code is:

httpRequest = urllib2.Request("http:/www....com")
pageContent = urllib2.urlopen(httpRequest)
pageContent.readline()

How can I get hold of the socket properties to set TCP_NODELAY?

In normal socket I would be using function:

socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
Dante May Code
  • 11,177
  • 9
  • 49
  • 81
Andrey Rubliov
  • 1,359
  • 2
  • 17
  • 24
  • 1
    Why do you set it when calling a web server??? – jgauffin Dec 04 '12 at 09:49
  • I am polling a website at specific time, when some information should be published there. The speed is very important, so setting TCP_NODELAY avoid accumulating small portions of data into bigger portionos before sending packet. – Andrey Rubliov Dec 04 '12 at 16:32
  • 1
    *What* 'small portions of data'? The HTTP request will almost certainly be flushed by the library in a single send() and sent by TCP as a single packet. And setting TCP_NODELAY at your end doesn't change how the peer sends the response. Not a real question. – user207421 Dec 04 '12 at 20:21
  • extra points for same with requests, a.k.a. python-requests – Dima Tisnek Jul 26 '13 at 12:39
  • @user207421: Even if the request is transferred to the socket by a single `write()` or `send()`, the socket stack does not know that, and Nagle delays sending the TCP packet in order to combine it with another `write()`/`send()` call that isn't coming. – Ben Voigt Apr 19 '23 at 15:22

3 Answers3

14

If you need to access to such low level property on the socket used, you'll have to overload some objects.

First, you'll need to create a subclass of HTTPHandler, that in the standard library do :

class HTTPHandler(AbstractHTTPHandler):

    def http_open(self, req):
        return self.do_open(httplib.HTTPConnection, req)

    http_request = AbstractHTTPHandler.do_request_

As you can see, it uses a HTTPConnection to open connection... You'll have to override it too ;) to upgrade the connect() method.

Something like this should be a good start :

class LowLevelHTTPConnection(httplib.HTTPConnection):

    def connect(self):
        httplib.HTTPConnection.connect(self)
        self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)


class LowLevelHTTPHandler(HTTPHandler):

    def http_open(self, req):
        return self.do_open(LowLevelHTTPConnection, req)

urllib2 is smart enough to allow you to subclass some handler and then use it, the urllib2.build_opener is made for this :

urllib2.install_opener(urllib2.build_opener(LowLevelHTTPHandler)) # tell urllib2 to use your HTTPHandler in replacement of the standard HTTPHandler
httpRequest = urllib2.Request("http:/www....com")
pageContent = urllib2.urlopen(httpRequest)
pageContent.readline()
Cédric Julien
  • 78,516
  • 15
  • 127
  • 132
9

For requests, the classes seem to be in request.packages.urllib3; there are 2 classes, HTTPConnection, and HTTPSConnection. They should be monkeypatchable in place at the module top level:

from requests.packages.urllib3 import connectionpool

_HTTPConnection = connectionpool.HTTPConnection
_HTTPSConnection = connectionpool.HTTPSConnection

class HTTPConnection(_HTTPConnection):
    def connect(self):
        _HTTPConnection.connect(self)
        self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

class HTTPSConnection(_HTTPSConnection):
    def connect(self):
        _HTTPSConnection.connect(self)
        self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

connectionpool.HTTPConnection = HTTPConnection
connectionpool.HTTPSConnection = HTTPSConnection
Mr. Llama
  • 20,202
  • 2
  • 62
  • 115
  • awesome. I've seen this monkey-patching already for server certificate verification and sni. I hope they don't conflict. – Dima Tisnek Aug 03 '13 at 15:44
  • 2
    As of the time I'm writing this, urllib3 (and hence requests) default to TCP_NODELAY. Have a look at `requests.packages.urllib3.connection.HTTPConnection`, specifically `default_socket_options`. – Mr. Llama May 18 '17 at 21:19
1

Do you have to use urllib2?

Alternatively, you can use httplib2, which has the TCP_NODELAY option set.

https://code.google.com/p/httplib2/

It adds a dependency to your project, but seems less brittle than monkey patching.

James Lim
  • 12,915
  • 4
  • 40
  • 65