3

I'm writing a wrapper for the Deutsche Bahn's Fahrplan OpenData API.

However, I cannot seem to produce the same result as a simple curl request as follows:

>>>import requests
>>>header = {'Authorization': 'Bearer 36e39957ace6f405a82cfb09522d0a8d'}
>>>departure_data = requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/departureBoard/8011160?date=2017-06-30', headers=header)

# Now, using a journey's details id, lets request some journey details from the endpoint
>>>requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + departure_data.json()[0]['detailsId'], headers=header)
<Response [404]>
>>>requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + departure_data.json()[0]['detailsId'], headers=header).request.url
'https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/782334%2F275830%2F795514%2F136979%2F80%3fstation_evaId%3D8098160'

Alright, so far, so bad. As you can see I'm using the data as given to me. Now, calling the endpoint via the Website, it tells me it runs this curl command:

curl -X GET --header "Accept: application/json" --header "Authorization: Bearer 36e39957ace6f405a82cfb09522d0a8d" "https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/782334%252F275830%252F795514%252F136979%252F80%253fstation_evaId%253D8098160"

And this bit of magic happens:

the original journey ID

'782334%2F275830%2F795514%2F136979%2F80%3fstation_evaId%3D8098160'

becomes:

'782334%252F275830%252F795514%252F136979%252F80%253fstation_evaId%253D8098160'

and returns a status 200.

Out of seemingly nowhere, the journey id got some characters added to it. I copy & pasted it into the given field and nothing more, so I know it wasn't me.

I believe there is some sort of encoding/ decoding happening, but I've never seen this before, and honestly don't know what to make of it.

How do I handle this in my code? Clearly I need to do something in addition to simply parsing the departures endpoint? Or, better yet, am I simply missing out on something obvious?

I've sent multiple mails to the DB developers, but so far have not heard from them back.

deepbrook
  • 2,523
  • 4
  • 28
  • 49

2 Answers2

1

What you see is a double URL encoding. The percent sign % is being URL-encoded with a corresponding sequence of %25:

/ -> %2F -> %252F

Try to urldecode departure_data.json()[0]['detailsId'] before you do the following

>>> requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + departure_data.json()[0]['detailsId'], headers=header)

For example like this

requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + urllib.unquote(urllib.unquote(departure_data.json()[0]['detailsId'])), headers=header)
Igor
  • 2,834
  • 2
  • 26
  • 44
  • 2
    I think what is actually needed is `urllib.quote`, not `urllib.unquote` (it looks like the correct string to put in the URL is encoded twice). Also, in Python 3 these functions are `urllib.parse.quote` and `urllib.parse.unquote`. – jdehesa Jun 30 '17 at 10:59
  • Nice! Thought you were using Python 2, but anyway glad you figured that out. – Igor Jun 30 '17 at 14:34
1

In v1 of the API, there are four endpoints defined:

GET /location/{name}
GET /arrivalBoard/{id}
GET /departureBoard/{id}
GET /journeyDetails/{id}

Each of them expects an {id} parameter. The value you give this parameter must be URL-encoded, which is something you neglected to do.

/departureBoard/{id} gives you a list of Board items, which are defined like so:

Board {
    name (string): ,
    type (string): ,
    boardId (string): ,
    stopId (string): ,
    stopName (string): ,
    dateTime (string): ,
    origin (string): ,
    track (string): ,
    detailsId (string):
}

The detailsId is what you can use to hit the /journeyDetails/{id} endpoint. So the minimum working code looks like this (note the call to urllib.parse.quote):

import requests
import urllib

header = {'Authorization': 'Bearer 36e39957ace6f405a82cfb09522d0a8d'}
departure_data = requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/departureBoard/8011160?date=2017-06-30', headers=header)

journey_id = departure_data.json()[0]['detailsId']
journey_details = requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + urllib.parse.quote(journey_id), headers=header)

The value of journey_id is itself URL-encoded and decodes to something that looks like an URL fragment:

urllib.parse.unquote(journey_id)
# -> '564552/203236/867650/245641/80?station_evaId=8098160'

So it looks a bit like you could simply use the original value to make further requests, but that's a misconception.

Treat the ID as an opaque plain text value that you need to encode, like you would encode any other arbitrary value before using it in a URL.

When you quote the value, the percent signs are escaped by %25, which leads to the longer value:

'564552%2F203236%2F867650%2F245641%2F80%3fstation_evaId%3D8098160'
'564552%252F203236%252F867650%252F245641%252F80%253fstation_evaId%253D8098160'

Since the Deutsche Bahn API is self-documenting through Swagger, it might be easiest to install a swagger client let it create an API wrapper for you (see their swagger.json). pyswagger looks usable, but there are others to try.

This way you could concentrate on making API requests and getting data and the low level plumbing like URL-encoding and even authorization would happen transparently in the background.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • When you say it *must* be urlencoded, is that something that is simply known or did you read that somewhere in the documentation? – deepbrook Jun 30 '17 at 13:36
  • 1
    Well. You cannot place a plain text value into a URL, i.e. create a URL by string concatenation, without the risk of breaking it. It's still often done and often it works because the urlencoded version of the value looks exactly like the plain version, but it's still always wrong because the URL breaks as soon as the value in question contains characters that are special to a URL, like the `%` sign in this case. Incorrectly encoded data makes the server mis-interpret the request. – Tomalak Jun 30 '17 at 14:43
  • When I see that the server's endpoint is `GET /journeyDetails/{id}`, it's immediately obvious to me that `{id}` is a moving target and whatever you put in there must be escaped properly. – Tomalak Jun 30 '17 at 14:50