0

I am using beautiful soup to try to parse information from a webpage:

url='https://www.onthemarket.com/for-sale/2-bed-flats-apartments/shortlands-station/?max-bedrooms=&radius=0.5'
req=requests.get(url)

req returns <Response [403]>

Python requests. 403 Forbidden suggests there is a user-agent issue, but I cannot find it in my instance.

Are there any suggestions

frank
  • 3,036
  • 7
  • 33
  • 65
  • I notice the header `cookie: logglytrackingsession=` being set in the request. The server likely denies requests without a tracking cookie, which get set when loaded in a browser. – clubby789 Oct 14 '19 at 21:51
  • 1
    It could be what @JammyDodger mentions, it could be the user agent you mentioned, check the headers your browser sends when accessing the site. – luis.parravicini Oct 14 '19 at 21:53
  • @luis, it was headers. thanks – frank Oct 14 '19 at 22:03

1 Answers1

0

In such case so please use headers which include user-agent

from bs4 import BeautifulSoup
import requests


url = 'https://www.onthemarket.com/for-sale/2-bed-flats-apartments/shortlands-station/?max-bedrooms=&radius=0.5'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
}

html_page = requests.get(url, headers=headers).text
soup = BeautifulSoup(html_page, "html.parser")

print(soup.text)