BeautifulSoup not returning full html - 403 Forbidden?

Question

I am having issues with parsing a website. There seems to be a "403 Forbidden" error. Does that mean I cannot scrape through the website? If so, is there some sort of work around?

import requests
from bs4 import BeautifulSoup
import lxml

URL = 'https://frequentmiler.com/best-credit-card-sign-up-offers/'
webpage = requests.get(URL)

soup = BeautifulSoup(webpage.content, 'lxml')

print(soup.prettify())

This returns:

<html>
 <head>
  <title>
   403 Forbidden
  </title>
 </head>
 <body>
  <center>
   <h1>
    403 Forbidden
   </h1>
  </center>
  <hr/>
  <center>
   nginx
  </center>
 </body>
</html>

Does this answer your question? [Python requests. 403 Forbidden](https://stackoverflow.com/questions/38489386/python-requests-403-forbidden) — Gino Mempin, Feb 27 '21 at 04:15
check out the answer in [this](https://stackoverflow.com/a/71018471/10729303) thread. — nipun, Feb 07 '22 at 12:21

score 2 · Answer 1 · answered Feb 25 '21 at 15:44

2

The website is knowing that you're trying to get a source page from a python code, you must escape this by adding a user-agent in request headers.

headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36"}
webpage = requests.get(URL,headers=headers)

And now, you are like a human surfer using a simple web browser =).

answered Feb 25 '21 at 15:44

bguernouti

162
7

Amazing.. that worked! How did you determine the user-agent value? Thank you! – ajp093 Feb 27 '21 at 01:47
I just tried, but on the first try, I've got it. A little bit of experience and you will able to determine those tricks, keep working! :D – bguernouti Feb 27 '21 at 14:10

score 0 · Answer 2 · answered Feb 25 '21 at 15:15

0

That means that you're not authorized to view that url.

answered Feb 25 '21 at 15:15

j0eTheRipper

1
1

BeautifulSoup not returning full html - 403 Forbidden?

2 Answers2