Proxy with urllib2

Question

I open urls with:

site = urllib2.urlopen('http://google.com')

And what I want to do is connect the same way with a proxy I got somewhere telling me:

site = urllib2.urlopen('http://google.com', proxies={'http':'127.0.0.1'})

but that didn't work either.

I know urllib2 has something like a proxy handler, but I can't recall that function.

ZelluX · Accepted Answer · 2011-01-25T11:43:27.687

143

proxy = urllib2.ProxyHandler({'http': '127.0.0.1'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')

edited Jan 25 '11 at 11:43

answered Sep 20 '09 at 02:49

ZelluX

69,107
19
71
104

1

Hi, @ZelluX, I only want the proxies setting enabled on some function, does that mean I have to install and uninstall the opener for every invocation of the function? – satoru Nov 11 '11 at 08:42
@Satoru.Logic Maybe you can write a decorator to simplify the install/uninstall process? – ZelluX Nov 11 '11 at 13:25
2

Seems there's no `uninstall` method in `urllib2`, but we can make one-time proxy settings; instead of `installing` the opener, we create a `request` object, and use a opener to `open` it. – satoru Nov 11 '11 at 13:39
3

@Satoru.Logic: I think the traditional approach is to configure an environment variable like `HTTP_PROXY` and then check in your code if it is defined using `os.environ["HTTP_PROXY"]`. – ccpizza Sep 10 '12 at 10:43
don't forget the port number eg 3128 proxy = urllib2.ProxyHandler({'http': '127.0.0.1:3128'}) – J'e Oct 20 '14 at 22:22
@satoru, you can mimic uninstall through [this](http://stackoverflow.com/questions/2276689/how-do-i-unit-test-a-module-that-relies-on-urllib2#comment42948730_2276884) – Sergey M Dec 01 '14 at 23:33
how does this know what port to use?? – MikeSchem Apr 24 '17 at 22:27
For everybody using the solution above and wondering why the created `ProxyHandler` isn't used: I needed to use this solution for getting things working because I created an additional `context` for SSL verification: https://stackoverflow.com/a/24766345/520162 – eckes Jan 15 '18 at 08:43

dcrosta · Answer 2 · 2009-09-20T02:51:32.163

19

You have to install a ProxyHandler

urllib2.install_opener(
    urllib2.build_opener(
        urllib2.ProxyHandler({'http': '127.0.0.1'})
    )
)
urllib2.urlopen('http://www.google.com')

edited Sep 20 '09 at 02:51

answered Sep 20 '09 at 02:34

dcrosta

26,009
8
71
83

I get File "D:/Desktop/Desktop/mygoogl", line 64, site = url.urlopen('google.com) File "C:\Python26\lib\urllib2.py", line 124, in urlopen return _opener.open(url, data, timeout) AttributeError: ProxyHandler instance has no attribute 'open' – Chris Stryker Sep 20 '09 at 02:43
I missed a call to urllib2.build_opener() – dcrosta Sep 20 '09 at 02:51

score 12 · Answer 3 · edited Jun 04 '14 at 16:21

You can set proxies using environment variables.

import os
os.environ['http_proxy'] = '127.0.0.1'
os.environ['https_proxy'] = '127.0.0.1'

urllib2 will add proxy handlers automatically this way. You need to set proxies for different protocols separately otherwise they will fail (in terms of not going through proxy), see below.

For example:

proxy = urllib2.ProxyHandler({'http': '127.0.0.1'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')
# next line will fail (will not go through the proxy) (https)
urllib2.urlopen('https://www.google.com')

Instead

proxy = urllib2.ProxyHandler({
    'http': '127.0.0.1',
    'https': '127.0.0.1'
})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
# this way both http and https requests go through the proxy
urllib2.urlopen('http://www.google.com')
urllib2.urlopen('https://www.google.com')

Shouldn't you have used e.g. os.environ['http_proxy'] in your lower two sets of examples? — Jonathan Benn, Apr 13 '17 at 20:51

score 7 · Answer 4 · answered Mar 16 '12 at 14:55

To use the default system proxies (e.g. from the http_support environment variable), the following works for the current request (without installing it into urllib2 globally):

url = 'http://www.example.com/'
proxy = urllib2.ProxyHandler()
opener = urllib2.build_opener(proxy)
in_ = opener.open(url)
in_.read()

score 3 · Answer 5 · answered Oct 08 '14 at 09:48

In Addition to the accepted answer: My scipt gave me an error

File "c:\Python23\lib\urllib2.py", line 580, in proxy_open
    if '@' in host:
TypeError: iterable argument required

Solution was to add http:// in front of the proxy string:

proxy = urllib2.ProxyHandler({'http': 'http://proxy.xy.z:8080'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')

score 3 · Answer 6 · answered Jul 03 '16 at 10:19

3

One can also use requests if we would like to access a web page using proxies. Python 3 code:

>>> import requests
>>> url = 'http://www.google.com'
>>> proxy = '169.50.87.252:80'
>>> requests.get(url, proxies={"http":proxy})
<Response [200]>

More than one proxies can also be added.

>>> proxy1 = '169.50.87.252:80'
>>> proxy2 = '89.34.97.132:8080'
>>> requests.get(url, proxies={"http":proxy1,"http":proxy2})
<Response [200]>

answered Jul 03 '16 at 10:19

Waqar Detho

1,502
18
17

Hi @WaqarDetho How will one know what proxy addresses to use? Is it just some random ip addresses? – Aman Singh Mar 05 '20 at 03:55
Hi @AmanSingh I did this long time ago. But as far as I remember I find these proxy addresses from the internet. I manually injected them in the code. – Waqar Detho Apr 24 '20 at 11:57

score 0 · Answer 7 · answered Apr 25 '16 at 08:09

0

In addition set the proxy for the command line session Open a command line where you might want to run your script

netsh winhttp set proxy YourProxySERVER:yourProxyPORT

run your script in that terminal.

answered Apr 25 '16 at 08:09

pensebien

506
4
16

Proxy with urllib2

7 Answers7

Linked

Related