I am trying to read in the html from a url. I tried the following:
import requests
f = requests.get('http://www.google.com')
print f.text
Which returned the following Traceback:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.google.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x03142310>: Failed to establish a new connection: [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond',))
So, I am assuming that my work (university) has a Proxy. I used http://www.whatismyproxy.com/ to get the external IP, guessed that the port is 80, and generated the following code (IP has been changed):
import requests
f = requests.get(link,
proxies={"http": "http://123.45.678.910:80"})
print f.text
This does something, but the html it returns is not that of Google (and is identical if I change the url to twitter):
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<title>Index of /</title>
</head>
<body>
<h1>Index of /</h1>
<table>
<tr><th valign="top"><img src="/icons/blank.gif" alt="[ICO]"></th><th><a href="?C=N;O=D">Name</a></th><th><a href="?C=M;O=A">Last modified</a></th><th><a href="?C=S;O=A">Size</a></th><th><a href="?C=D;O=A">Description</a></th></tr>
<tr><th colspan="5"><hr></th></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="direct.dat">direct.dat</a></td><td align="right">2013-10-24 18:09 </td><td align="right"> 73 </td><td> </td></tr>
<tr><td valign="top"><img src="/icons/folder.gif" alt="[DIR]"></td><td><a href="errors/">errors/</a></td><td align="right">2015-01-13 16:15 </td><td align="right"> - </td><td> </td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="filtered.dat">filtered.dat</a></td><td align="right">2015-02-06 13:39 </td><td align="right">3.0K</td><td> </td></tr>
<tr><td valign="top"><img src="/icons/folder.gif" alt="[DIR]"></td><td><a href="html/">html/</a></td><td align="right">2016-09-30 07:50 </td><td align="right"> - </td><td> </td></tr>
<tr><td valign="top"><img src="/icons/unknown.gif" alt="[ ]"></td><td><a href="wpad.dat">wpad.dat</a></td><td align="right">2016-03-30 05:16 </td><td align="right">2.5K</td><td> </td></tr>
<tr><th colspan="5"><hr></th></tr>
</table>
<address>Apache/2.4.10 (Debian) Server at www.google.com Port 80</address>
</body></html>
Is this something I can fix, or is it related to my work's settings (and how do I confirm this)?