1

I am using Martin Konecny's code from here to query an http site, from behind my corporate firewall:

The code is this:

    import urllib.request
req = urllib.request.Request(
    'http://www.espncricinfo.com/', 
    data=None, 
    headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    }
)

f = urllib.request.urlopen(req)
g = open('writing.txt','w')
g.write(f.read().decode('utf-8'))
g.close

However, once I run this code, I receive the PAC file and not the contents of the url.

How do I get past it to download the contents of the website as given the url?

Thank you!

Community
  • 1
  • 1
Soham
  • 863
  • 2
  • 19
  • 35

1 Answers1

2
import urllib.request

req = urllib.request.Request('http://www.espncricinfo.com/', data=None, headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    }
)

proxy_support = urllib.request.ProxyHandler({'http': 'ip:port'})
opener = urllib.request.build_opener(proxy_support)
# make opener object the global default opener. 
urllib.request.install_opener(opener)


f = urllib.request.urlopen(req)

g = open('writing.txt','w')
g.write(f.read().decode('utf-8'))
g.close
ergesto
  • 367
  • 1
  • 8