Novice programmer so in advance, sorry if what I'm writing is badly worded or just plain stupid.
I'm trying to scrape info from a website and store the results in a database. The goal is to get all the train numbers, the stations and see if the train is late or not. The way I started doing it is in a loop, I've been building up this URL by changing $LETTER
with each letter of the alphabet, one at a time: https://reservia.viarail.ca/GetStations.aspx?q=$LETTER
I then parse the results and store everything correctly in a database. This script doesn't take a long time to run so that's no issue. The issue comes when I'm trying to get all the trains that pass through each station. To do this, I go through every station that was stored previously (580 of them) and then using this URL and changing the $DATE
for today in YYY-MM-DD
and $CODE
with the station code:
reservia.viarail.ca/tsi/GetTrainList.aspx?OriginStationCode=$CODE&Date=$DATE
So for example, I would have This link for Montreal
and I would go through each element of the table and see the train number to then insert it in a table. That was my plan so far but it is taking way too much time to run this script (over 7 minutes) which makes sense since we're opening 580 pages.
What's a better way of doing this? I'm using python as I'm trying to learn it so I've been importing the urllib library and using it to decode the page, and then I would sort through the data. Thanks for any suggestion/help!