403 Forbidden BeautifulSoup Web Scraper

Question

I was building a web scraper to pull hrefs off of https://www.startengine.com/explore, but I was struggling to get any hrefs. I decided to print the webpage and figured out why.

Here is my code:

import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re

URL = "https://www.startengine.com/explore"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")

links = []
print(soup)

This is the output:

<html>
<head><title>403 Forbidden</title></head>
<body>
<center><h1>403 Forbidden</h1></center>
</body>
</html>

Can someone help me work around the "403 Forbidden"?

Yes, it's probably a bot prevention. You're writing a bot, they don't want you doing this. You should respect that. — Barmar, Apr 20 '22 at 21:52
https://stackoverflow.com/questions/23073209/403-forbidden-output-while-using-beautifulsoup — Selman, Apr 20 '22 at 21:54

score 2 · Accepted Answer · answered Apr 20 '22 at 21:57

You need to inject your user-agent as header as follows:

import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re

URL = "https://www.startengine.com/explore"
headers={'User-Agent':'mozilla/5.0'}
page = requests.get(URL,headers=headers)
print(page)
soup = BeautifulSoup(page.text, "html.parser")

links = []
print(soup)

403 Forbidden BeautifulSoup Web Scraper

1 Answers1