2

There is a website which happily opens in a browser and returns a 403 HTTP error when opened in Python as follows:

from bs4 import BeautifulSoup
import urllib2

link = 'http://niezalezna.pl/'

r = urllib2.urlopen(link).read()
soup = BeautifulSoup(r, 'lxml')

print soup.prettify()

The website is a popular news service. Is it then possible to make a URL return HTTP 403 error when opened using a piece of code like above? Thanks,

tsotsi
  • 683
  • 2
  • 8
  • 20

1 Answers1

2

Found the answer thanks to the comments above. The code is below and the full answer can be found here: Changing user agent on urllib2.urlopen

from bs4 import BeautifulSoup
import urllib2

link = 'http://niezalezna.pl/'

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
response = opener.open(link)

soup = BeautifulSoup(response, 'lxml')

print soup.prettify()
Community
  • 1
  • 1
tsotsi
  • 683
  • 2
  • 8
  • 20