1

Possible Duplicate:
Fetch a Wikipedia article with Python

>>> print urllib2.urlopen('http://zh.wikipedia.org/wiki/%E6%AF%9B%E6%B3%BD%E4%B8%9C').read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
    response = meth(req, response)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error
    return self._call_chain(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
Community
  • 1
  • 1
Hanfei Sun
  • 45,281
  • 39
  • 129
  • 237
  • check their (robots.txt)[http://zh.wikipedia.org/robots.txt] page to see whether the bot you're using is banned – elssar Aug 05 '12 at 06:06

2 Answers2

6

You need to provide a user-agent else you'll get a 403, like you did.

On Wikimedia wikis, if you don't supply a User-Agent header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. See our User-Agent policy. Other MediaWiki installations may have similar policies.

So just add a user-agent to your code and it should work fine.

elssar
  • 5,651
  • 7
  • 46
  • 71
1

Try to download the page with wget of cURL.
If you can't then you might have a network problem.
If you can, then Wikipedia might block certain user agents. In that case, use urllib2's add_header to define a custom user agent (to imitate a browser request).

EyalAr
  • 3,160
  • 1
  • 22
  • 30
  • don't need to imitate a browser request, can just use a custom user agent. I got results while using the user agent 'elssar-at-elssar-laptop'. – elssar Aug 05 '12 at 06:49