41

I have this program that check a website, and I want to know how can I check it via proxy in Python...

this is the code, just for example

while True:
    try:
        h = urllib.urlopen(website)
        break
    except:
        print '['+time.strftime('%Y/%m/%d %H:%M:%S')+'] '+'ERROR. Trying again in a few seconds...'
        time.sleep(5)
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
Bruno 'Shady'
  • 4,348
  • 13
  • 55
  • 73

4 Answers4

56

By default, urlopen uses the environment variable http_proxy to determine which HTTP proxy to use:

$ export http_proxy='http://myproxy.example.com:1234'
$ python myscript.py  # Using http://myproxy.example.com:1234 as a proxy

If you instead want to specify a proxy inside your application, you can give a proxies argument to urlopen:

proxies = {'http': 'http://myproxy.example.com:1234'}
print("Using HTTP proxy %s" % proxies['http'])
urllib.urlopen("http://www.google.com", proxies=proxies)

Edit: If I understand your comments correctly, you want to try several proxies and print each proxy as you try it. How about something like this?

candidate_proxies = ['http://proxy1.example.com:1234',
                     'http://proxy2.example.com:1234',
                     'http://proxy3.example.com:1234']
for proxy in candidate_proxies:
    print("Trying HTTP proxy %s" % proxy)
    try:
        result = urllib.urlopen("http://www.google.com", proxies={'http': proxy})
        print("Got URL using proxy %s" % proxy)
        break
    except:
        print("Trying next proxy in 5 seconds")
        time.sleep(5)
Oren
  • 4,711
  • 4
  • 37
  • 63
Pär Wieslander
  • 28,374
  • 7
  • 55
  • 54
  • using your example, how can I print what proxy it is using in the time the urlopen occur? – Bruno 'Shady' Jul 02 '10 at 18:36
  • @Shady: Just throw in a `print` statement that prints the value of `proxies['http']`. Take a look at my updated example to see how it could be done. – Pär Wieslander Jul 02 '10 at 18:40
  • ok thanks, but if I want more proxies, like, tons of it, for example 10 proxies, opening one before the next one – Bruno 'Shady' Jul 02 '10 at 18:48
  • @Shady: You mean that you want to try a new proxy for each call until you find one that works? Change the `proxies` argument for each call to `urlopen`, passing in a new proxy for each call. – Pär Wieslander Jul 02 '10 at 18:48
  • actually, I want to check the website with some proxies, like 10, and then repeat the proccess with this proxies, but the question here is HOW can I print what proxy the urlopen is using at the time of the check – Bruno 'Shady' Jul 02 '10 at 19:00
  • @Shady: I've added another example that uses several proxies. Is this what you're looking for? – Pär Wieslander Jul 02 '10 at 19:16
  • Yes, thank you... now I just need some proxy list very good =p – Bruno 'Shady' Jul 02 '10 at 19:26
  • Wieslander, I'm getting error for every proxy I use, what could be? – Bruno 'Shady' Jul 02 '10 at 22:29
  • @Shady: That's impossible to tell without more details. I would start by verifying that the proxies actually work by trying them out in a web browser first. If they **don't** work in the browser either, then the problem is probably with the proxies or in the network. If they **do** work in the browser, you'll probably have to double check that you're actually passing the proxy settings correctly to `urlopen`. – Pär Wieslander Jul 02 '10 at 23:31
  • Wieslander, I've just tested the proxy and it worked on firefox, I've got it from here (http://www.samair.ru/proxy/time-01.htm).. could you give some look on my script to see what is happening ? I will appreciate =) (http://pastebin.com/TgZw7xvV) – Bruno 'Shady' Jul 03 '10 at 00:13
  • My `response = urllib.urlopen(url, proxies = proxies}` doesnt work. Doesnt give any output. Any idea? – Yogesh D Jul 11 '17 at 16:48
  • `urllib.urlopen` in Python 3 doen't have parameter `proxies`. It out of dated: > Proxy handling, which was done by passing a dictionary parameter to urllib.urlopen, can be obtained by using ProxyHandler objects. – secsilm Jun 04 '21 at 06:41
  • And you can use the first environment variable method. It's the simplest. – secsilm Jun 04 '21 at 06:42
56

Python 3 is slightly different here. It will try to auto detect proxy settings but if you need specific or manual proxy settings, think about this kind of code:

#!/usr/bin/env python3
import urllib.request

proxy_support = urllib.request.ProxyHandler({'http' : 'http://user:pass@server:port', 
                                             'https': 'https://...'})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)

with urllib.request.urlopen(url) as response:
    # ... implement things such as 'html = response.read()'

Refer also to the relevant section in the Python 3 docs

DomTomCat
  • 8,189
  • 1
  • 49
  • 64
6

Here example code guide how to use urllib to connect via proxy:

authinfo = urllib.request.HTTPBasicAuthHandler()

proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})

# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
                                     urllib.request.CacheFTPHandler)

# install it
urllib.request.install_opener(opener)

f = urllib.request.urlopen('http://www.google.com/')
"""
daz
  • 696
  • 10
  • 10
2

For http and https use:

proxies = {'http':'http://proxy-source-ip:proxy-port',
           'https':'https://proxy-source-ip:proxy-port'}

more proxies can be added similarly

proxies = {'http':'http://proxy1-source-ip:proxy-port',
           'http':'http://proxy2-source-ip:proxy-port'
           ...
          }

usage

filehandle = urllib.urlopen( external_url , proxies=proxies)

Don't use any proxies (in case of links within network)

filehandle = urllib.urlopen(external_url, proxies={})

Use proxies authentication via username and password

proxies = {'http':'http://username:password@proxy-source-ip:proxy-port',
           'https':'https://username:password@proxy-source-ip:proxy-port'}

Note: avoid using special characters such as :,@ in username and passwords

CDspace
  • 2,639
  • 18
  • 30
  • 36
mayure098
  • 111
  • 1
  • 5