What are some of the problems with GTFS?

Question

I am intersted in replacing my current data format that I use with GTFS, but I hear and read from here and there that there are flaws in GTFS file format.

Most of the time I see that you can't somehow predict some things such as delays or some real-time stuff. They say you can't get the "whole picture" with them.

So what I am asking is there anyone more experienced with GTFS , since I am seeing them only for first time, that could have possibly used GTFS in some kind of application and could tell the problems they have faced while developing?

Maybe someone has a suggestion about a better kind of file format? Or a combination of some formats?

score 2 · Accepted Answer · 2015-04-23T10:32:27.570

2

It's hard to say whether GTFS is a good fit or not for your application without knowing what your application's requirements are, but I can offer a few remarks.

If your goal is to provide real-time data to users you should take a look at GTFS-realtime, a complementary data format designed specifically for issuing real-time updates. For most public-transit applications, using a GTFS and a GTFS-realtime feed together does indeed give the "whole picture" about a transit network, or near enough.

In terms of GTFS itself, my main complaint is that it seems designed specifically for route-planning applications and using data in this format for any other purpose can be difficult. For example, while a GTFS feed records information about transit stops and routes, there is no requirement that each of these have a single, canonical entry—if the data spans multiple board periods, there will almost always be (seemingly) duplicate entries for each.

This doesn't matter if you're plotting a route based on where and when a person is travelling, since the links between objects ensure you'll always generate the right result. If you're starting with only a person's location and want to know, "What transit resources are available nearby?", reliably producing an accurate answer requires some contortions.

edited Apr 23 '15 at 10:32

answered Apr 23 '15 at 10:26

First of all, thank you for answering. Everything was fine until you wrote about the last paragraph? In order to help me understand, can you tell me for example what would be the criteria for defining "What transit resources are available nearby?" ? – dimrizo Apr 23 '15 at 12:48
Here's an example. Suppose you want to answer, "What are the five transit stops closest to my current location?" Using only GTFS data this is harder to answer than it appears, simply because a single transit stop may have multiple records corresponding to it in the `stops` table. You have to determine which record is valid for the time at which you're issuing the query, but there's no simple "valid-on" field defined in GTFS—you're forced to work backwards from the schedule information the feed provides. Note, though, that none of this is an issue if you're only generating trip plans. – Apr 23 '15 at 12:56
Wait Simon, what if search in stops.txt file using Euclidian DIstance or Manhattan Distance Theorem to calculate this distance? Why would that be wrong? Or you could do some kind euretic algorithm like Dijsktra between the points. In stops.txt the stops are only once. – dimrizo Apr 23 '15 at 14:14
Ordering stops by distance is the easy part; the harder part is distinguishing among multiple results for the same stop code. In the feed you're looking at there is only one record per stop, but the standard doesn't guarantee this and in other feeds this will not be the case. – Apr 23 '15 at 14:59
Ah ok, the problem is when the feeds overlap.Got it. So to get the "whole "whole picture"" if that's the case, I would need to have a feed per city. The problem is when you use more than one feeds that have stops in common, in a small region. [Just a little example of a good format of data I could think of] – dimrizo Apr 23 '15 at 15:48

score 1 · Answer 2 · answered May 10 '15 at 09:14

It depends on your needs for importing existing feeds. If yes, then you need to be able to handle it anyhow. In my case, import was required, so I use the same for data that stems from other formats like PDF timetables. Otherwise you need to supoprt two formats. If you do not need it for import (or export) you may consider your own format : I find GTFS does not reveal the actual network.

GTFS needs quite some interpretation and digesting in order to end up with the whole picture that you can answer planning questions on.

I merge stops together if they are close, like a few meters apart, and assume 'trivial walk' if 10-50 meters. That automatically handles combining multipe feeds.

Apart from that, I turn the stop_times roughly inside-out to create a 'link' table'. The end result is that for each stop you have a list of departures and their destinations.

Biggest problem until now is that GTFS feeds can record the trips from an operator point of view. Passengers can remain sitting in the bus if it flips the headsign from 351 to 285, takes a new driver onboard and continues. That means you need to know what trips actually need to be seen as joined in passenger terms.

I solved minor problem for manual feed entry by having my GTFS parser accept a handful of constructs that ease editing, such as leaving out the sequence numbers to have it generated incrementally, and recognising 02.13+1 as 26.13.

What are some of the problems with GTFS?

2 Answers2