OS: Windows 7; Python 2.7.6 using the Python GUI Shell
I'm trying to crawl a website using a Python script, and several authors use the urllib and urllib2 libraries. To store the HTML content of the URL in a variable, I've seen a similar approach proposed:
import urllib2
c=urllib2.urlopen('http://en.wikipedia.org/wiki/Rocket_Internet')
contents=c.read( )
print contents
urlopen generates an error after a 120+ seconds:
Traceback (most recent call last):
File "H:/Movie_Knowledge_Graph/crawl.py", line 4, in <module>
c=urllib2.urlopen('http://en.wikipedia.org/wiki/Rocket_Internet')
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 404, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 422, in _open
'_open', req)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\Python27\lib\urllib2.py", line 1184, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>
I'm aware that we have to set environment variables accordingly, when using a proxy. But, I'm using my home WiFi network which requires no proxy. I tried urllib as well, but it generates the same error.