0

I have just started learning python recently, although I do have some previous coding experience.

I am trying to scrape something from a website using BeautifulSoup and keep getting an error. I realise this question has been posted before, but I was unsure how to implement the solutions..

Here is my code:

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'http://archive.ontheissues.org/Free_Trade.htm'

#opening up connection, grabbing the page
uClient = uReq(my_url)

The error message I get is:

  File "D:\Anaconda\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: Forbidden

Supposedly, the answer here fixes the problem, but I was unsure how to actually code it and what my entire modified script should look like.

Could someone tell me how I would amend my code?

HonsTh
  • 65
  • 7

2 Answers2

1

An alternative would be to try setting up a known browser user agent with the Request method:

import bs4
from urllib.request import Request,urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'http://archive.ontheissues.org/Free_Trade.htm'
req=Request(my_url,headers={'User-Agent': 'Mozilla/5.0'})
#opening up connection, grabbing the page
uClient = uReq(req)
  • Thanks for this, it's much more easier to implement than some of the other solutions I had a look at. My main goal was to try and get the text from the links that go to "X Full quotes by XXXX" Would you know how to proceed forward? – HonsTh Jul 05 '19 at 07:39
0

Use requests. It is much better as it does all the heavy work (urllib) for you:

# pip install requests

from requests import Session
from bs4 import BeautifulSoup


my_url = 'http://archive.ontheissues.org/Free_Trade.htm'

s = Session()

r = s.get(my_url)

# get soup
soup = BeautifulSoup(r.content, 'html5lib')

print(soup.prettify())
Prayson W. Daniel
  • 14,191
  • 4
  • 51
  • 57