1

I am having issues with parsing a website. There seems to be a "403 Forbidden" error. Does that mean I cannot scrape through the website? If so, is there some sort of work around?

import requests
from bs4 import BeautifulSoup
import lxml

URL = 'https://frequentmiler.com/best-credit-card-sign-up-offers/'
webpage = requests.get(URL)

soup = BeautifulSoup(webpage.content, 'lxml')

print(soup.prettify())

This returns:

<html>
 <head>
  <title>
   403 Forbidden
  </title>
 </head>
 <body>
  <center>
   <h1>
    403 Forbidden
   </h1>
  </center>
  <hr/>
  <center>
   nginx
  </center>
 </body>
</html>
ajp093
  • 27
  • 3
  • Does this answer your question? [Python requests. 403 Forbidden](https://stackoverflow.com/questions/38489386/python-requests-403-forbidden) – Gino Mempin Feb 27 '21 at 04:15
  • check out the answer in [this](https://stackoverflow.com/a/71018471/10729303) thread. – nipun Feb 07 '22 at 12:21

2 Answers2

2

The website is knowing that you're trying to get a source page from a python code, you must escape this by adding a user-agent in request headers.

headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36"}
webpage = requests.get(URL,headers=headers)

And now, you are like a human surfer using a simple web browser =).

bguernouti
  • 162
  • 7
  • Amazing.. that worked! How did you determine the user-agent value? Thank you! – ajp093 Feb 27 '21 at 01:47
  • I just tried, but on the first try, I've got it. A little bit of experience and you will able to determine those tricks, keep working! :D – bguernouti Feb 27 '21 at 14:10
0

That means that you're not authorized to view that url.