0

I want to audit trains timetable. The trains have a GPS and their positions are published in https://trenesendirecto.sofse.gob.ar/mapas/sanmartin/index.php My plan is to scrape the train positions and check the time that they arrive to the stations and publish this info to all users. In order to obtain train coordinates I write the following script in Python import requests, random, string

#Function to generate random code for rnd
def RandomGenerator():
     x = ''.join(random.choice(string.ascii_uppercase + string.ascii_lowercase + string.digits) for _ in range(16))
    return x

# URL requests
url = 'https://trenesendirecto.sofse.gob.ar/mapas/ajax_posiciones.php'

parametros = {
              'ramal':'31', 
              'rnd':RandomGenerator(),                   
              'key':'v%23v%23QTUNWp%23MpWR0wkj%23RhHTqVUM'}

encabezado = {
          'Host': 'trenes.sofse.gob.ar', 
          'Referer': 'https://trenesendirecto.sofse.gob.ar/mapas/sanmartin/index.php', 
          'X-Requested-With': 'XMLHttpRequest', 
          'Accept':'application/json, text/javascript, */*',
          'UserAgent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) \
                  Chrome/65.0.3325.146 Safari/537.36'
                }

res = requests.get(url, params = parametros, headers = encabezado, timeout=1)

# Output
print(res.url)
print(res.headers)
print(res.status_code)
print(res.content)

The output is:

https://trenesendirecto.sofse.gob.ar/mapas/ajax_posiciones.php?ramal=31&key=v%2523v%2523QTUNWp%2523MpWR0wkj%2523RhHTqVUM&rnd=ui8GObHTSpVpPqRo
{'Date': 'Tue, 13 Mar 2018 12:16:03 GMT', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Server': 'nginx'}
403
b'<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body bgcolor="white">\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n'

Using the same url generated by the requests in the browser I obtain the following output from browser, which is exactly what I want.

Why the script does not work?

Is there any other method to obtain the data?

1 Answers1

0

Have you tried testing the API url on a REST Client such as Postman or Mozilla's RESTClient add-on? This is the first step in web development before you can consume web-services in an application.

Besides, error code 403 means you may not be authorized to access this data or do not have the right permissions set. The latter in most usually the case with 403 errors as it differs from a 401 error.

You must confirm whether the API uses Basic Auth or token-based authentication.

A general GET request on RESTClient for this url gives status: 200OK which means the endpoint responds to HTTP requests but needs authorization if you want to request certain information.

enter image description here

amanb
  • 5,276
  • 3
  • 19
  • 38