Website opens in the browser but return 403 when opened in Python

Question

There is a website which happily opens in a browser and returns a 403 HTTP error when opened in Python as follows:

from bs4 import BeautifulSoup
import urllib2

link = 'http://niezalezna.pl/'

r = urllib2.urlopen(link).read()
soup = BeautifulSoup(r, 'lxml')

print soup.prettify()

The website is a popular news service. Is it then possible to make a URL return HTTP 403 error when opened using a piece of code like above? Thanks,

This means that the site doesn't allow scrapers. fake a user agent to get past this. — n1c9, Apr 18 '16 at 23:11

score 2 · Answer 1 · edited May 23 '17 at 12:14

Found the answer thanks to the comments above. The code is below and the full answer can be found here: Changing user agent on urllib2.urlopen

from bs4 import BeautifulSoup
import urllib2

link = 'http://niezalezna.pl/'

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
response = opener.open(link)

soup = BeautifulSoup(response, 'lxml')

print soup.prettify()

Website opens in the browser but return 403 when opened in Python

1 Answers1