I want to audit trains timetable. The trains have a GPS and their positions are published in https://trenesendirecto.sofse.gob.ar/mapas/sanmartin/index.php My plan is to scrape the train positions and check the time that they arrive to the stations and publish this info to all users. In order to obtain train coordinates I write the following script in Python import requests, random, string
#Function to generate random code for rnd
def RandomGenerator():
x = ''.join(random.choice(string.ascii_uppercase + string.ascii_lowercase + string.digits) for _ in range(16))
return x
# URL requests
url = 'https://trenesendirecto.sofse.gob.ar/mapas/ajax_posiciones.php'
parametros = {
'ramal':'31',
'rnd':RandomGenerator(),
'key':'v%23v%23QTUNWp%23MpWR0wkj%23RhHTqVUM'}
encabezado = {
'Host': 'trenes.sofse.gob.ar',
'Referer': 'https://trenesendirecto.sofse.gob.ar/mapas/sanmartin/index.php',
'X-Requested-With': 'XMLHttpRequest',
'Accept':'application/json, text/javascript, */*',
'UserAgent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) \
Chrome/65.0.3325.146 Safari/537.36'
}
res = requests.get(url, params = parametros, headers = encabezado, timeout=1)
# Output
print(res.url)
print(res.headers)
print(res.status_code)
print(res.content)
The output is:
https://trenesendirecto.sofse.gob.ar/mapas/ajax_posiciones.php?ramal=31&key=v%2523v%2523QTUNWp%2523MpWR0wkj%2523RhHTqVUM&rnd=ui8GObHTSpVpPqRo
{'Date': 'Tue, 13 Mar 2018 12:16:03 GMT', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Server': 'nginx'}
403
b'<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body bgcolor="white">\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n'
Using the same url generated by the requests in the browser I obtain the following output from browser, which is exactly what I want.
Why the script does not work?
Is there any other method to obtain the data?