As part of my discovery of web scraping, I'd like to browse and get all of my Strava activities. I'll use the profile of Thibaut Pinot as an example. I'm using Python 3 and Requests.
On the user's page, one can see every of his activities, but not all at once. Indeed, they are chronologically sorted, so you have to use a timeline. You can then choose to display activities weekly or monthly and choose the period of time: all of this is done by GET requests. More precisely, the fragment identifier matches the following regexp:
(interval_type|graph_date_range)?chart_type=miles&interval_type=(week|month)&interval=[1-9]{6}&year_offset=[1-9]+
The first group doesn't seem to matter at all. Then, interval_type
specifies whether to display weekly or monthly results. interval
allows us to choose the date to display, using the format YYYYMM where YYYY is the year, and MM the month/week to display. Finally, year_offset
isn't really useful. Thus, the GET request is fairly straightforward to make: I just have to choose a monthly display and iterate over the different months I want to monitor.
However, you can notice that while loading https://www.strava.com/pros/1603067#interval_type?interval=201802&interval_type=month&chart_type=miles&year_offset=0 (that is, the page that displays the runs of February 2018), the results of the current month are first displayed, and only then the results of February 2018. Thus, using requests.get
always gives my the same page, no matter what fragment identifier I set.
My web browser must get a new web page after the first one (the one with the current month) is loaded, but how could I get it using Python ?