0

I am facing an issue while trying to scrape information from a website using the requests.get method. The information I receive from the website is inconsistent and doesn't match the actual data displayed on the website.

As an example, I have tried to scrape the size of an apartment located at the following link:
https://www.sreality.cz/en/detail/sale/flat/2+kt/havlickuv-brod-havlickuv-brod-stromovka/3574729052. The size of the apartment is displayed as 54 square meters on the website, but when I use the requests.get method, the result shows 43 square meters instead of 54. Apartment size on the webpage
Apartment size from the inspect code
Result in vscode

I have attached screenshots of the apartment size displayed on the website and the result in my Visual Studio Code for reference. The code I used for this is given below:

import requests

test = requests.get("https://www.sreality.cz/api/cs/v2/estates/3574729052?tms=1676140494143").json()

test["items"][8]

I am unable to find a solution to this issue and would greatly appreciate any help or guidance. If there is anything wrong with the format of my post, please let me know and I will make the necessary changes. Thank you in advance.

Nikcus
  • 13
  • 2

1 Answers1

0

Here is one way to get the information you're after:

import requests
import pandas as pd

pd.set_option('display.max_columns', None, 'display.max_colwidth', None)
headers = {
    'accept': 'application/json, text/plain, */*',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
}
s = requests.Session()
s.headers.update(headers)
url = 'https://www.sreality.cz/en/detail/sale/flat/2+kt/havlickuv-brod-havlickuv-brod-stromovka/3574729052'
property_id = url.split('/')[-1]
api_url = f'https://www.sreality.cz/api/en/v2/estates/{property_id}'
s.get(url)
df = pd.json_normalize(s.get(api_url).json()['items'])
df = df[['name', 'value']]
print(df)

Result in terminal:

name    value
0   Total price 3 905 742
1   Update  Yesterday
2   ID  3574729052
3   Building    Brick
4   Property status Under construction
5   Ownership   Personal
6   Property location   Quiet part of municipality
7   Floor   3. floor of total 5 including 1 underground
8   Usable area 54
9   Balcony 4
10  Cellar  2
11  Sales commencement date 25.04.2022
12  Water   [{'name': 'Water', 'value': 'District water supply'}]
13  Electricity [{'name': 'Electricity', 'value': '230 V'}]
14  Transportation  [{'name': 'Transportation', 'value': 'Train'}, {'name': 'Transportation', 'value': 'Road'}, {'name': 'Transportation', 'value': 'Urban public transportation'}, {'name': 'Transportation', 'value': 'Bus'}]
15  Road    [{'name': 'Road', 'value': 'Asphalt'}]
16  Barrier-free access True
17  Furnished   False
18  Elevator    True
Barry the Platipus
  • 9,594
  • 2
  • 6
  • 30
  • Hi Barry, thanks a lot for your quick response! Do you happen to know what the issue with my solution? – Nikcus Feb 11 '23 at 19:24
  • You're welcome @Nikcus Not sure what your issue was tbh, could not reproduce it on my end. My solution works though. – Barry the Platipus Feb 11 '23 at 19:41
  • Hmm I see, I thought it was reproducible... Yes your solution is working! I tried adding information about headers as you provided and managed to make my code work as well so thanks a million headers = { 'accept': 'application/json, text/plain, */*', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36' } s = requests.Session() s.headers.update(headers) – Nikcus Feb 11 '23 at 20:11