The data you are looking for seems to be hidden in a script block at the end of the raw HTML.
You can try something like this:
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
from pandas import json_normalize
url = 'https://www.racingpost.com'
res = requests.get(url).text
raw = res.split('cardsMatrix":{"courses":')[1].split(',"date":"2020-03-06","heading":"Tomorrow\'s races"')[0]
data = json.loads(raw)
df = json_normalize(data)
Output:
id abandoned allWeather surfaceType colour name countryCode meetingUrl hashName meetingTypeCode races
0 1083 False True Polytrack 3 Chelmsford GB /racecards/1083/chelmsford-aw/2020-03-06 chelmsford-aw Flat [{'id': 753047, 'abandoned': False, 'result': ...
1 1212 False False 4 Ffos Las GB /racecards/1212/ffos-las/2020-03-06 ffos-las Jumps [{'id': 750498, 'abandoned': False, 'result': ...
2 1138 False True Polytrack 11 Dundalk IRE /racecards/1138/dundalk-aw/2020-03-06 dundalk-aw Flat [{'id': 753023, 'abandoned': False, 'result': ...
3 513 False True Tapeta 5 Wolverhampton GB /racecards/513/wolverhampton-aw/2020-03-06 wolverhampton-aw Flat [{'id': 750658, 'abandoned': False, 'result': ...
4 565 False False 0 Jebel Ali UAE /racecards/565/jebel-ali/2020-03-06 jebel-ali Flat [{'id': 753155, 'abandoned': False, 'result': ...
5 206 False False 0 Deauville FR /racecards/206/deauville/2020-03-06 deauville Flat [{'id': 753186, 'abandoned': False, 'result': ...
6 54 True False 1 Sandown GB /racecards/54/sandown/2020-03-06 sandown Jumps [{'id': 750510, 'abandoned': True, 'result': F...
7 30 True False 2 Leicester GB /racecards/30/leicester/2020-03-06 leicester Jumps [{'id': 750501, 'abandoned': True, 'result': F...
Caveat: Be aware that you have to manually search for the string to properly split res
at the end.
Edit: More robust solution.
To get the script block in total and parse from there try this code:
url = 'https://www.racingpost.com'
res = requests.get(url).content
soup = BeautifulSoup(res)
# salient data seems to be in 20th script block
data = soup.find_all("script")[19].text
clean = data.split('window.__PRELOADED_STATE = ')[1].split(";\n")[0]
clean = json.loads(clean)
clean.keys()
Output:
['stories', 'bookmakers', 'panelTemplate', 'cardsMatrix', 'advertisement']
Then retrieve e.g. data saved to key cardsMatrix
:
parsed = json_normalize(clean["cardsMatrix"]).courses.values[0]
pd.DataFrame(parsed)
Output again the above (but with more robust solution):
id abandoned allWeather surfaceType colour name countryCode meetingUrl hashName meetingTypeCode races
0 1083 False True Polytrack 3 Chelmsford GB /racecards/1083/chelmsford-aw/2020-03-06 chelmsford-aw Flat [{'id': 753047, 'abandoned': False, 'result': ...
1 1212 False False 4 Ffos Las GB /racecards/1212/ffos-las/2020-03-06 ffos-las Jumps [{'id': 750498, 'abandoned': False, 'result': ...