Getting a website with urllib results in HTTP 405 error

Question

Im learning beautifulsoup and was trying to write a small script to find houses on a dutch real estate website. When I try to get the website's content, I'm immediately getting an HTTP405 error:

  File "funda.py", line 2, in <module>
    html = urlopen("http://www.funda.nl")
  File "<folders>request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "<folders>request.py", line 532, in open
    response = meth(req, response)
  File "<folders>request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "<folders>request.py", line 570, in error
    return self._call_chain(*args)
  File "<folders>request.py", line 504, in _call_chain
    result = func(*args)
  File "<folders>request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 405: Not Allowed

What im trying to execute:

from urllib.request import urlopen
html = urlopen("http://www.funda.nl")

Any idea why this is resulting in HTTP405? Im just doing a GET request, right?

It's definitely a GET request, but you're being detected as a bot, and this particular server sends a 405 error code in that case. Try tuning the headers to appear as a normal browser. — leovp, Apr 02 '17 at 10:23
Related - https://stackoverflow.com/questions/27652543/how-to-use-python-requests-to-fake-a-browser-visit?noredirect=1&lq=1 — shad0w_wa1k3r, Apr 02 '17 at 10:25

score 3 · Accepted Answer · answered Apr 02 '17 at 10:34

Possible duplicate of HTTPError: HTTP Error 403: Forbidden. You need to fake that you are a regular visitor. This is generally (varies from site to site) done by using a common / regular User-Agent HTTP header.

>>> url = "http://www.funda.nl"
>>> import urllib.request
>>> req = urllib.request.Request(
...     url, 
...     data=None, 
...     headers={
...         'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
...     }
... )
>>> f = urllib.request.urlopen(req)
>>> f.status, f.msg
(200, 'OK')

Using the requests library -

>>> import requests
>>> response = requests.get(
...     url,
...     headers={
...         'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
...     }
... )
>>> response.status_code
200

score -3 · Answer 2 · answered Apr 02 '17 at 10:27

-3

It works if you don't use Requests or urllib2:

import urllib
html = urllib.urlopen("http://www.funda.nl")

leovp's comment makes sense.

answered Apr 02 '17 at 10:27

Nandakumar Edamana

770
1
4
10

Getting a website with urllib results in HTTP 405 error

2 Answers2