How to fix HTTPError: Forbidden in urllib/urlopen

Question

I have just started learning python recently, although I do have some previous coding experience.

I am trying to scrape something from a website using BeautifulSoup and keep getting an error. I realise this question has been posted before, but I was unsure how to implement the solutions..

Here is my code:

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'http://archive.ontheissues.org/Free_Trade.htm'

#opening up connection, grabbing the page
uClient = uReq(my_url)

The error message I get is:

  File "D:\Anaconda\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: Forbidden

Supposedly, the answer here fixes the problem, but I was unsure how to actually code it and what my entire modified script should look like.

Could someone tell me how I would amend my code?

score 1 · Accepted Answer · answered Jul 04 '19 at 18:37

1

An alternative would be to try setting up a known browser user agent with the Request method:

import bs4
from urllib.request import Request,urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'http://archive.ontheissues.org/Free_Trade.htm'
req=Request(my_url,headers={'User-Agent': 'Mozilla/5.0'})
#opening up connection, grabbing the page
uClient = uReq(req)

answered Jul 04 '19 at 18:37

Mohamed Yilmaz

66
5

Thanks for this, it's much more easier to implement than some of the other solutions I had a look at. My main goal was to try and get the text from the links that go to "X Full quotes by XXXX" Would you know how to proceed forward? – HonsTh Jul 05 '19 at 07:39

Prayson W. Daniel · Answer 2 · 2019-07-04T18:28:23.660

0

Use requests. It is much better as it does all the heavy work (urllib) for you:

# pip install requests

from requests import Session
from bs4 import BeautifulSoup


my_url = 'http://archive.ontheissues.org/Free_Trade.htm'

s = Session()

r = s.get(my_url)

# get soup
soup = BeautifulSoup(r.content, 'html5lib')

print(soup.prettify())

edited Jul 04 '19 at 18:28

answered Jul 04 '19 at 18:22

Prayson W. Daniel

14,191
4
51
57

How to fix HTTPError: Forbidden in urllib/urlopen

2 Answers2