15

Is it possible to fetch pages with urllib2 through a SOCKS proxy on a one socks server per opener basic? I've seen the solution using setdefaultproxy method, but I need to have different socks in different openers.

So there is SocksiPy library, which works great, but it has to be used this way:

import socks
import socket
socket.socket = socks.socksocket
import urllib2
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "x.x.x.x", y)

That is, it sets the same proxy for ALL urllib2 requests. How can I have different proxies for different openers?

Fluffy
  • 27,504
  • 41
  • 151
  • 234

7 Answers7

17

Try with pycurl:

import pycurl
c1 = pycurl.Curl()
c1.setopt(pycurl.URL, 'http://www.google.com')
c1.setopt(pycurl.PROXY, 'localhost')
c1.setopt(pycurl.PROXYPORT, 8080)
c1.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5)

c2 = pycurl.Curl()
c2.setopt(pycurl.URL, 'http://www.yahoo.com')
c2.setopt(pycurl.PROXY, 'localhost')
c2.setopt(pycurl.PROXYPORT, 8081)
c2.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5)

c1.perform() 
c2.perform() 
systempuntoout
  • 71,966
  • 47
  • 171
  • 241
11

Yes, you can. I repeat my answer on How can I use a SOCKS 4/5 proxy with urllib2? You need to create an opener for every proxy like you do with an http proxy. The code for adding this feature to SocksiPy is available in GitHub https://gist.github.com/869791 and is as simple as:

opener = urllib2.build_opener(SocksiPyHandler(socks.PROXY_TYPE_SOCKS4, 'localhost', 9999))
print opener.open('http://www.whatismyip.com/automation/n09230945.asp').read()

For more information I've written an example running multiple Tor instances to behave like a rotating proxy: Distributed Scraping With Multiple Tor Circuits

Community
  • 1
  • 1
sw.
  • 3,240
  • 2
  • 33
  • 43
0

== EDIT == (old HTTP-Proxy example was here..)

My fault.. urllib2 has no builtin support for SOCKS proxying..

There are some 'hacks' adding SOCKS to urllib2 (or the socket object in general) here.
But I hardly suspect that this will work with multiple proxies like you require it.

As long as you don't wan't to hook / subclass urllib2.ProxyHandler I would suggest to go with pycurl.

Shirkrin
  • 3,993
  • 1
  • 29
  • 35
  • It ain't working. urllib2.URLError: . The proxy is working fine (so it's not its problem) – Fluffy Mar 31 '10 at 11:53
  • Strange, in my tests (I'm behind a http proxy) it works fine. Did you try multiple simultanous connections? – Shirkrin Apr 01 '10 at 06:45
  • No, just your snippet without authentication. Are you sure we both are talking about SOCKS proxies? – Fluffy Apr 01 '10 at 13:37
0

You have only one socket for all openers and implementing socks is in socket level. So, you can't.
I suggest you to use pycurl library, it much more flexible.

Andrew
  • 8,330
  • 11
  • 45
  • 78
  • is an easy way to use pycurl with 2.6 on windows? – Fluffy Apr 01 '10 at 18:51
  • nope, looks like project is dead (last update was 2 years ago) and it doesn't compile on windows with new curl – Andrew Apr 03 '10 at 13:34
  • *nope, (...) it doesn't compile on windows with new curl* How does compiling pycurl with newer versions of curl relate to using pycurl with newer versions of Python? – Piotr Dobrogost Oct 25 '12 at 13:17
0

You might be able to use threading locks if there aren't too many connections being made at once, and you need to access from multiple threads:

import socks
import socket
import thread
lock = thread.allocate_lock()
socket.socket = socks.socksocket

def GetConn():
    lock.acquire()
    import urllib2
    socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "x.x.x.x", y)
    conn = urllib2.urlopen(ARGUMENTS HERE)
    lock.release()
    return conn

You might also be able to use something like this every time you need to get a connection:

urllib2 = execfile('urllib2.py')
urllib2.socket = dummy_class() # dummy_class needs the socket module's methods

These are obviously not fantastic solutions, but I've put in my 2¢ anyway :-)

cryo
  • 14,219
  • 4
  • 32
  • 35
0

A cumbersome but working solution for using a SOCKS proxy is to set up provixy with proxy chaining and then set the HTTP_PROXY provided by privoxy via system variable or any other way.

ccpizza
  • 28,968
  • 18
  • 162
  • 169
-3

You could do you it by setting evironmental variable HTTP_PROXY in following format:

user:pass@proxy:port

or if you use bat/cmd, add before calling script:

set HTTP_PROXY=user:pass@proxy:port

I am using such cmd-file to make easy_install work under proxy.

Dmitry Kochkin
  • 923
  • 8
  • 17