2

I'm using socksipy with urllib2 in Python 2.6. Everything works fine except the timeouts when i hit a hanging URL. None of the urllib2 function timeout arguments or global socket default timeouts are working. I've even tried setting the timeout a number of different ways in the sublcassed handlers below with no success. Any ideas?

Here is a test script (assuming that you have the socksipy project installed and are adding it to your system path):

import os, sys
import httplib
sys.path.append( "/parent/path/to/socksipy/project" )
import socks # import socksipy
import socket
socket.setdefaulttimeout(30.0)
import urllib2

class SocksiPyConnection(httplib.HTTPConnection):
    def __init__(self, proxytype, proxyaddr, proxyport = None, rdns = False, username = None, password = None, *args, **kwargs):
        self.proxyargs = (proxytype, proxyaddr, proxyport, rdns, username, password)
        httplib.HTTPConnection.__init__(self, *args, **kwargs)

    def connect(self):
        self.sock = socks.socksocket()
        self.sock.setproxy(*self.proxyargs) 
        if isinstance(self.timeout, float):
            self.sock.settimeout(self.timeout)
        self.sock.connect((self.host, self.port))

class SocksiPyHandler(urllib2.HTTPHandler):
    def __init__(self, *args, **kwargs):
        self.args = args
        self.kw = kwargs
        urllib2.HTTPHandler.__init__(self)

    def http_open(self, req):
        def build(host, port=None, strict=None, timeout=0):
            conn = SocksiPyConnection(*self.args, host=host, port=port, strict=strict, timeout=timeout, **self.kw)
            return conn
        return self.do_open(build, req)

if __name__ == '__main__':

    #
    # this one works for non-hanging URL
    #
    proxyhost = "responder.w2"
    proxyport = 1050
    socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, proxyhost, int(proxyport))
    socket.socket = socks.socksocket
    resp = urllib2.urlopen("http://www.google.com", timeout=30.0)
    # hang here
    print "returned 1"


    #
    # one way to go about it for a hanging URL
    #
    proxyhost = "responder.w2"
    proxyport = 1050
    socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, proxyhost, int(proxyport))
    socket.socket = socks.socksocket
    resp = urllib2.urlopen("http://erma.orr.noaa.gov/cgi-bin/mapserver/charts?version=1.1.1&service=wms&request=GetCapabilities", timeout=30.0)
    # it hangs here
    print "returned 2"


    #  
    # another way to go about it for hanging URL
    #
    proxyhost = "responder.w2"
    proxyport = 1050
    opener = urllib2.build_opener(SocksiPyHandler(socks.PROXY_TYPE_SOCKS5, proxyhost, int(proxyport)) )
    resp = opener.open("http://erma.orr.noaa.gov/cgi-bin/mapserver/charts?version=1.1.1&service=wms&request=GetCapabilities", timeout=30.0)
    # it hangs here
    print "returned 3"
sudobangbang
  • 241
  • 5
  • 15
  • I'm trying to do exactly what this person is doing but i need timeouts: http://stackoverflow.com/questions/2317849/how-can-i-use-a-socks-4-5-proxy-with-urllib2/2339260#2339260 – sudobangbang Jul 04 '12 at 19:01
  • Would using the requests module help? It's a lot simpler. http://docs.python-requests.org/en/latest/user/quickstart/#timeouts – hughdbrown Jul 04 '12 at 19:03
  • @hughdbrown: i might rewrite everything using requests module if it supports socks5 proxies – sudobangbang Jul 04 '12 at 19:34
  • I ran this code and received a time-out exception on the second URL after 30 seconds: urllib2.URLError: – del Jul 26 '12 at 02:29
  • Maybe this answer can help you: http://stackoverflow.com/questions/8464391/what-should-i-do-if-socket-setdefaulttimeout-is-not-working – Simon Steinberger Aug 02 '12 at 13:56

2 Answers2

1

This worked for me:

socks.socket.setdefaulttimeout(7)

You should avoid editing python socks library directly.

bummi
  • 27,123
  • 14
  • 62
  • 101
0

It turns out the "hanging/timeout" issue i mentioned above was in fact a "blocking" issue in the sockssipy socks.py code. If you are hitting an endpoint that still responds with 200 but sends no data (0 bytes) then socks.py will block cause that's how it's written. Here is the before and after for creating your own timeout:

socks.py BEFORE:

def __recvall(self, bytes):
    """__recvall(bytes) -> data
    Receive EXACTLY the number of bytes requested from the socket.
    Blocks until the required number of bytes have been received.
    """
    data = ""
    while len(data) < bytes:
       data = data + self.recv(bytes-len(data))
    return data

socks.py AFTER with timeout:

def __recvall(self, bytes):
    """__recvall(bytes) -> data
    Receive EXACTLY the number of bytes requested from the socket.
    Blocks until the required number of bytes have been received.
    """
    data = self.recv(bytes, socket.MSG_WAITALL)
    if type(data) not in (str, unicode) or len(data) != bytes:
        raise socket.timeout('timeout')
    return data
sudobangbang
  • 241
  • 5
  • 15