Let me explain for you the scenario:
- Website is generating a
session
id behind that parameter X-Session-Id
which is dynamically generated once you visit the main page page index. So i called it via GET
request and I've picked it up from the headers
response.
I've figured out an POST
request which is automatically generated before you hit your desired url
which is actually using the session
id which we collected before. here is it https://covid19tracker.health.ny.gov/vizql/w/NYS-COVID19-Tracker/v/NYSDOHCOVID-19Tracker-Fatalities/clear/sessions/{session id}
Now we can call your target which is https://covid19tracker.health.ny.gov/views/NYS-COVID19-Tracker/NYSDOHCOVID-19Tracker-Fatalities?%3Aembed=yes&%3Atoolbar=no&%3Atabs=n
.
Now I noticed another XHR
request to the back-end
API
. But before we do the call, We will parse the HTML
content for picking up the time
object which is responsible on generating the data freshly
from the API
so we will get an instant data (consider it like a live chat actually). in our case it's behind lastUpdatedAt
inside the HTML
I noticed as well that we will need to pickup the recent X-Session-Id
generated from our previous POST
request.
Now we will make the call using our picked up session
to https://covid19tracker.health.ny.gov/vizql/w/NYS-COVID19-Tracker/v/NYSDOHCOVID-19Tracker-Fatalities/bootstrapSession/sessions/{session}
Now we have received the full response. you can parse it or do whatever you want.
import requests
import re
data = {
'worksheetPortSize': '{"w":1536,"h":1250}',
'dashboardPortSize': '{"w":1536,"h":1250}',
'clientDimension': '{"w":1536,"h":349}',
'renderMapsClientSide': 'true',
'isBrowserRendering': 'true',
'browserRenderingThreshold': '100',
'formatDataValueLocally': 'false',
'clientNum': '',
'navType': 'Reload',
'navSrc': 'Top',
'devicePixelRatio': '2.5',
'clientRenderPixelLimit': '25000000',
'allowAutogenWorksheetPhoneLayouts': 'true',
'sheet_id': 'NYSDOH%20COVID-19%20Tracker%20-%20Fatalities',
'showParams': '{"checkpoint":false,"refresh":false,"refreshUnmodified":false}',
'filterTileSize': '200',
'locale': 'en_US',
'language': 'en',
'verboseMode': 'false',
':session_feature_flags': '{}',
'keychain_version': '1'
}
def main(url):
with requests.Session() as req:
r = req.post(url)
sid = r.headers.get("X-Session-Id")
r = req.post(
f"https://covid19tracker.health.ny.gov/vizql/w/NYS-COVID19-Tracker/v/NYSDOHCOVID-19Tracker-Fatalities/clear/sessions/{sid}")
r = req.get(
"https://covid19tracker.health.ny.gov/views/NYS-COVID19-Tracker/NYSDOHCOVID-19Tracker-Fatalities?%3Aembed=yes&%3Atoolbar=no&%3Atabs=n")
match = re.search(r"lastUpdatedAt.+?(\d+),", r.text).group(1)
time = '{"featureFlags":"{\"MetricsAuthoringBeta\":false}","isAuthoring":false,"isOfflineMode":false,"lastUpdatedAt":xxx,"workbookId":9}'.replace(
'xxx', f"{match}")
data['stickySessionKey'] = time
nid = r.headers.get("X-Session-Id")
r = req.post(
f"https://covid19tracker.health.ny.gov/vizql/w/NYS-COVID19-Tracker/v/NYSDOHCOVID-19Tracker-Fatalities/bootstrapSession/sessions/{nid}", data=data)
print(r.text)
main("https://covid19tracker.health.ny.gov")