1

I am using python's urllib2 and bs4. But urllib2 is running into some issues. Certain sites such as: http://dannijo.com/jewelry/necklaces/paloma.html

http://www.freepeople.com/

only return the error show below

HTTP Error 403: Forbidden

I have seen this question on stack overflow: urllib2.HTTPError: HTTP Error 403: Forbidden. But the hdrs which they suggest do not get past the 403 Forbidden.

If anyone knows a better hdr or if they can let me know what is causing this issue it would much appreciated.

This is the code that I currently have:

    hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8',
       'Connection': 'keep-alive'} 
    req = urllib2.Request(url,headers=hdr)
    page = urllib2.urlopen(url)
    soup = BeautifulSoup(page.read())
Community
  • 1
  • 1
Rorschach
  • 3,684
  • 7
  • 33
  • 77
  • That message doesn't come from your code, but from the server. The server says you're not allowed to do something. You could be asking for the wrong thing, or for the right thing in the wrong way. – Chad Miller Jul 24 '15 at 21:08

1 Answers1

1

You don't actually use req instance. So do the following:

soup = BeautifulSoup(urllib2.urlopen(req).read())
JuniorCompressor
  • 19,631
  • 4
  • 30
  • 57