48

How can I use a SOCKS 4/5 proxy with urllib2 to download a web page?

Mike
  • 483
  • 1
  • 5
  • 4

3 Answers3

67

You can use SocksiPy module. Simply copy the file "socks.py" to your Python's lib/site-packages directory, and you're ready to go.

You must use socks before urllib2. (Try it pip install PySocks )

For example:

import socks
import socket
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 8080)
socket.socket = socks.socksocket
import urllib2
print urllib2.urlopen('http://www.google.com').read()

You can also try pycurl lib and tsocks, for more detail, click on here.

KyungHoon Kim
  • 2,859
  • 2
  • 23
  • 26
panweizeng
  • 786
  • 5
  • 5
  • 4
    One issue with that is: the DNS lookup by urllib doesn't seem to go through the proxy. (even with rdns option and SOCKS4 type) – OJW Feb 28 '11 at 22:43
  • 4
    Just want to note that sockipy on sourceforge has some nasty bugs. At minimum use the fork here: code.google.com/p/socksipy-branch Since the project appears abandoned IMO someone should take that branch, change the name and write a blogpost so people don't continue to use this buggy (and imo not wonderfully written) lib. – tmc Dec 24 '11 at 02:05
  • I know this is old but what is wrong with the original sockipy? What bugs has it got? – paulm Nov 04 '13 at 22:52
  • Can't download socksipy anymore from your link. – Loïc Feb 27 '14 at 23:24
  • 3
    @OJW there is another answer here http://stackoverflow.com/a/13214222/288875 which also makes the host name lookups go over the SOCKS proxy – Andre Holzner May 21 '15 at 21:41
  • Looks like the latest fork of SocksiPy is now here: https://github.com/Anorov/PySocks – Benjamin Smith Jul 16 '20 at 13:26
21

Adding an alternative to pan's answer when you need to use many different proxies at the same time.

In that case you need to create an opener like you do with a http proxy. There is a code available in GitHub https://gist.github.com/869791

opener = urllib2.build_opener(SocksiPyHandler(socks.PROXY_TYPE_SOCKS4, 'localhost', 9999))
print opener.open('http://www.whatismyip.com/automation/n09230945.asp').read()
sw.
  • 3,240
  • 2
  • 33
  • 43
  • Hey, I was using the code from github. Unfortunately, the authentication doesn't work. I've passed in right username and password in the socksipyhandler.py, however, I get error (3, 'unknown username or invalid password'). I can confirm that my username password work, since my cURL command works with the same credentials. – harshal.c Nov 23 '15 at 16:27
  • Nevermind, figured out the issue, there was a typo in socks.py =), btw, great work. Thanks a ton! – harshal.c Nov 23 '15 at 16:29
4

Since SOCKS is a socket level proxy, you have to replace the socket object used by urllib2. Please take a look a this solution. If monkey patching is not good enough for you, then you can try to subclass or copy-modify the code from the urllib2 standard library.

fviktor
  • 2,861
  • 20
  • 24