I want to scrape data from a website; however I keep getting the HTTP: Error 405: Not Allowed. What am I doing wrong?
(I have looked at the documentation, and tried their code, with only my url in place of the example's; I still have the same error.)
Here's the code:
import requests, urllib
from urllib.request import Request, urlopen
list_url= ["http://www.glassdoor.com/Reviews/WhiteWave-Reviews-E9768.htm"]
for url in list_url:
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
response=urllib.request.urlopen(req).read()
If I skip the user-agent term, I get HTTP Error 403: Forbidden.
In the past, I have successfully scraped data (from another website) using the following:
for url in list_url:
raw_html = urllib.request.urlopen(url).read()
soup=None
soup = BeautifulSoup(raw_html,"lxml")
Ideally, I would like to keep a similar structure, that is, pass the content of the fetched url to BeautifulSoup. Thanks!