12

How can I set proxy for the last urllib in Python 3. I am doing the next

from urllib import request as urlrequest
ask = urlrequest.Request(url)     # note that here Request has R not r as prev versions
open = urlrequest.urlopen(req)
open.read()

I tried adding proxy as follows :

ask=urlrequest.Request.set_proxy(ask,proxies,'http')

However I don't know how correct it is since I am getting the next error:

336     def set_proxy(self, host, type):
--> 337         if self.type == 'https' and not self._tunnel_host:
    338             self._tunnel_host = self.host
    339         else:

AttributeError: 'NoneType' object has no attribute 'type'
bapap
  • 514
  • 8
  • 25
gm1
  • 245
  • 1
  • 2
  • 9

4 Answers4

21

You should be calling set_proxy() on an instance of class Request, not on the class itself:

from urllib import request as urlrequest

proxy_host = 'localhost:1234'    # host and port of your proxy
url = 'http://www.httpbin.org/ip'

req = urlrequest.Request(url)
req.set_proxy(proxy_host, 'http')

response = urlrequest.urlopen(req)
print(response.read().decode('utf8'))
mhawke
  • 84,695
  • 9
  • 117
  • 138
  • Pretty helpfull your solution hoewer I am not able to put it in practice due to some tunneling issue .I must mention that I just passed a public proxy from HMA , really helpfull If some could help in solving this issue . – gm1 Jan 03 '16 at 14:09
  • @gm1: re your second comment: you need to use the correct HTTP scheme. If you set a HTTPS proxy it will only be used if you access HTTPS URLs. In your example the proxy is not used because the URL is HTTP, not HTTPS. Change the URL to `https://www.httpbin.org/ip` and it will use the proxy (which should be `https://195.154.231.43:3128`). – mhawke Jan 03 '16 at 22:12
  • @gm1: That's good. If this answer was useful you can upvote it. If it is correct, you can accept it. See http://stackoverflow.com/help/someone-answers – mhawke Jan 04 '16 at 23:44
  • Hi , I would love to hoewer I am not able cause I dont have enough reputation points – gm1 Jan 05 '16 at 11:18
  • Hi , sureno problem but how can I do that, I dont find any functionality for that is there a button or something?Tnks – gm1 Jan 05 '16 at 12:53
  • I accepted , silly me was just the so obvious .Meanwhile something wird happened .It basically worked until now when I get:`URLError: ` – gm1 Jan 05 '16 at 15:00
13

I needed to disable the proxy in our company environment, because I wanted to access a server on localhost. I could not disable the proxy server with the approach from @mhawke (tried to pass {}, None and [] as proxies).

This worked for me (can also be used for setting a specific proxy, see comment in code).

import urllib.request as request

# disable proxy by passing an empty
proxy_handler = request.ProxyHandler({})
# alertnatively you could set a proxy for http with
# proxy_handler = request.ProxyHandler({'http': 'http://www.example.com:3128/'})

opener = request.build_opener(proxy_handler)

url = 'http://www.example.org'

# open the website with the opener
req = opener.open(url)
data = req.read().decode('utf8')
print(data)
Alexander Taubenkorb
  • 3,031
  • 2
  • 28
  • 30
  • does this download the file to disk? – Theo F Oct 11 '21 at 14:25
  • @TheoF it decodes the response with utf-8 and stores it in the `data` variable. After that the output is printed. If does not store it into a file on your disk. If you want to store the response to a disk depending on the response you might not decode it and save it afterwords (see eg https://www.w3schools.com/python/python_file_write.asp) – Alexander Taubenkorb Oct 19 '21 at 11:27
5

Urllib will automatically detect proxies set up in the environment - so one can just set the HTTP_PROXY variable either in your environment e.g. for Bash:

export HTTP_PROXY=http://proxy_url:proxy_port

or using Python e.g.

import os
os.environ['HTTP_PROXY'] = 'http://proxy_url:proxy_port'

Note from the urllib docs: "HTTP_PROXY[environment variable] will be ignored if a variable REQUEST_METHOD is set; see the documentation on getproxies()"

Pierz
  • 7,064
  • 52
  • 59
  • this is not the correct answer as it will set the proxy globally which can effect other processes outside python and more specifically it will break functionality that relies on localhost/loopback (like the AWS EC2 metadata IP address). The question is specifically about `urllib.request` and not the entire O/S – Stof Sep 11 '20 at 03:50
  • This works as an approach. The choice is up to the user - the `HTTP_PROXY` environment variable can be set per application, process, or globally. – Pierz Mar 24 '21 at 19:16
  • 2
    In my experience os.environ has correct proxy settings however urllib is not picking it up. I have to explicitly set it up using urllib.request.ProxyHandler in order for it to work. – jrp Mar 25 '21 at 18:39
  • 1
    In the `urllib` docs I linked to in the answer they explicitly state it **does use the environment variables**. I've added to the answer a note from the docs about certain caveats. – Pierz Mar 27 '21 at 16:44
  • Important note: It blows my mind that this is still the case in 2023 but, as per their [documentation](https://docs.python.org/3/howto/urllib2.html#proxies), _"Currently urllib.request does not support fetching of **https** locations through a proxy"_. – Seth Jun 15 '23 at 13:52
1
import urllib.request
def set_http_proxy(proxy):
    if proxy == None: # Use system default setting
        proxy_support = urllib.request.ProxyHandler()
    elif proxy == '': # Don't use any proxy
        proxy_support = urllib.request.ProxyHandler({})
    else: # Use proxy
        proxy_support = urllib.request.ProxyHandler({'http': '%s' % proxy, 'https': '%s' % proxy})
    opener = urllib.request.build_opener(proxy_support)
    urllib.request.install_opener(opener)

proxy = 'user:pass@ip:port'
set_http_proxy(proxy)

url  = 'https://www.httpbin.org/ip'
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
html = response.read()
html
  • While this code snippet may solve the question, [including an explanation](https://meta.stackexchange.com/questions/114762/explaining-entirely-code-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. – Ermiya Eskandary May 15 '22 at 12:56