1

So I am trying to work with python 2.7 to do various things that require pulling data from the internet. I have not been very successful, and I am looking for help to diagnose what I am doing wrong.

Firstly I managed to get pip to work by by defining the proxy like so, pip install --proxy=http://username:password@someproxy.com:8080 numpy. Hence python must be capable of getting through it!

However when it came to actually writing a .py script that could do the same I have had no success. I tried using the following code with urllib2 first:

import urllib2

uri = "http://www.python.org"
http_proxy_server = "someproxyserver.com"
http_proxy_port = "8080"
http_proxy_realm = http_proxy_server
http_proxy_user = "username"
http_proxy_passwd = "password"

# Next line = "http://username:password@someproxyserver.com:8080"
http_proxy_full_auth_string = "http://%s:%s@%s:%s" % (http_proxy_user,
                                                      http_proxy_passwd,
                                                      http_proxy_server,
                                                      http_proxy_port)

def open_url_no_proxy():
    urllib2.urlopen(uri)

    print "Apparent success without proxy server!"    

def open_url_installed_opener():
    proxy_handler = urllib2.ProxyHandler({"http": http_proxy_full_auth_string})

    opener = urllib2.build_opener(proxy_handler)
    urllib2.install_opener(opener)
    urllib2.urlopen(uri)

    print "Apparent success through proxy server!"

if __name__ == "__main__":
    open_url_no_proxy()
    open_url_installed_opener()

However I just get this error:

URLError: <urlopen error [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>

Then I tried urllib3 as this is the module used by pip to handle proxies:

from urllib3 import ProxyManager, make_headers

# Establish the Authentication Settings
default_headers = make_headers(basic_auth='username:password')
http = ProxyManager("https://www.proxy.com:8080/", headers=default_headers)

# Now you can use `http` as you would a normal PoolManager
r = http.request('GET', 'https://www.python.org/')

# Check data is from destination
print(r.data)

I got this error:

raise MaxRetryError(_pool, url, error or ResponseError(cause)) MaxRetryError: HTTPSConnectionPool(host='www.python.org', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', error('Tunnel connection failed: 407 Proxy Authorization Required',)))

I would really appreciate any help diagnosing this issue.

Tom
  • 925
  • 3
  • 10
  • 24
  • Is your proxy on https:// or http://? In the pip example it's http://, but urllib3 example it's https://. – shazow Jul 02 '15 at 08:43
  • If that doesn't work, you could try using Requests (built on urllib3, also used by pip): http://docs.python-requests.org/en/latest/user/advanced/?highlight=proxy#proxies – shazow Jul 02 '15 at 08:47
  • Yeh I have played around with the http vs https, actually when I have it set to http using urllib3 it doesn't have any errors, however it returns a page which tells me that the proxy requires authentication. – Tom Jul 02 '15 at 08:50
  • I am tried a script with request, and I was getting similar errors. I am starting to think its got something to do with the authentication details in giving it. – Tom Jul 02 '15 at 08:51
  • Could be. It's strange that pip works. Are you certain that pip is actually hitting the proxy and not ignoring it somehow? You could use something like tcpdump/ngrep to monitor traffic and see what it's actually doing. E.g. https://stackoverflow.com/questions/9241391/how-to-capture-all-the-http-packets-using-tcpdump – shazow Jul 02 '15 at 15:17
  • So I managed to get the urllib3 script to work on another machine (within the same network). It worked once, and then never again. I tried changing the url and it still doesn't work. Why is it behaving like this? – Tom Jul 06 '15 at 01:51
  • Sounds like a problem with the proxy or network. – shazow Jul 06 '15 at 08:44

1 Answers1

2

The solution to my problem was to use the requests module, see the below thread: Proxies with Python 'Requests' module

mtt2p list this code which worked for me.

import requests
import time
class BaseCheck():
    def __init__(self, url):
        self.http_proxy  = "http://user:pw@proxy:8080"
        self.https_proxy = "http://user:pw@proxy:8080"
        self.ftp_proxy   = "http://user:pw@proxy:8080"
        self.proxyDict = {
                      "http"  : self.http_proxy,
                      "https" : self.https_proxy,
                      "ftp"   : self.ftp_proxy
                    }
        self.url = url
        def makearr(tsteps):
            global stemps
            global steps
            stemps = {}
            for step in tsteps:
                stemps[step] = { 'start': 0, 'end': 0 }
            steps = tsteps
        makearr(['init','check'])
        def starttime(typ = ""):
            for stemp in stemps:
                if typ == "":
                    stemps[stemp]['start'] = time.time()
                else:
                    stemps[stemp][typ] = time.time()
        starttime()
    def __str__(self):
        return str(self.url)
    def getrequests(self):
        g=requests.get(self.url,proxies=self.proxyDict)
        print g.status_code
        print g.content
        print self.url
        stemps['init']['end'] = time.time()
        #print stemps['init']['end'] - stemps['init']['start']
        x= stemps['init']['end'] - stemps['init']['start']
        print x


test=BaseCheck(url='http://google.com')
test.getrequests()
Community
  • 1
  • 1
Tom
  • 925
  • 3
  • 10
  • 24