python urllib2 can not fetch a specific url

Question

I'm using urllib2 to request for URLs and read their contents but unfortunately it's not working for some URLs. look at these commands:

#No problem with this URL
urllib2.urlopen('http://www.huffingtonpost.com/2014/07/19/todd-akin-slavery_n_5602083.html')
#This one produced error
urllib2.urlopen('http://www.foxnews.com/us/2014/07/19/cartels-suspected-as-high-caliber-gunfire-sends-border-patrol-scrambling-on-rio/')

The second URL produced and error like this:

Traceback (most recent call last):
  File "D:/Developer Center/Republishan/republishan2/republishan2/test.py", line 306, in <module>
    urllib2.urlopen('http://www.foxnews.com/us/2014/07/19/cartels-suspected-as-high-caliber-gunfire-sends-border-patrol-scrambling-on-rio/')
  File "C:\Python27\lib\urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 410, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

What's the problem with this?

This answer worked with the url you provided, using urllib2 and changing the user-agent: http://stackoverflow.com/a/5196160/2679935 — julienc, Jul 20 '14 at 08:58

Wally · Accepted Answer · 2016-06-21T11:28:49.297

I think the site is checking for a User-Agent and or other headers which urllib doesn't set by default.

You can set a User-Agent manually.

Requests library sets a user-agent automatically.

However remember that requests user-agent may also be blocked by some sites.

Try this. This is working for me. You need to install the requests module first!

pip install requests

Then

import requests

r = requests.get("http://www.foxnews.com/us/2014/07/19/cartels-suspected-as-high-caliber-gunfire-sends-border-patrol-scrambling-on-rio/")

print r.text

Urllib is hard and you've to code more. Requests is simpler and is more in line with the Python philosophy that code should be beautiful!

python urllib2 can not fetch a specific url

1 Answers1