Dictionary best data structure for train routes?

Question

So I've been tasked with essentially reading in a file (notepad file) that has a bunch of train stops and the time it takes to get from one stop to another. For example it would look like:

Stop A     15
Stop B     12
Stop C     9

Now I need to go back and access these stops and their times. I was thinking of reading in the file and storing it as a dictionary. My question is, would a dictionary be the best for this? Or is there some other python tool that would prove more useful? Any thoughts would be appreciated!

Did you try it? Did it work? This seems like an odd question. You have a way of doing it in mind. Try it, then post it on [code review](http://codereview.stackexchange.com/) if you think there might be improvements to be made, or here if something goes wrong. — Gareth Latty, Mar 20 '13 at 20:49
A dictionary should be sufficient for this kind of task, yes — Chris Forrence, Mar 20 '13 at 20:50
How do you wish to use the data? If the order of the stops is meaningful a plain dictionary will not work. — Steven Rumbalski, Mar 20 '13 at 21:07

dawg · Accepted Answer · 2013-03-28T06:05:39.467

I will go against the grain -- and say that a straight flat dict is not the best for this.

Let's say you have 100 stops and multiple routes that are non-alphabetical and non-numeric. Think the Paris subway:

Paris Subway

Now try and use a straight Python dict to calculate the time between FDR and La Fourche? That involves two or more different routes and multiple options.

A tree or some form of graph is a better structure. A dict is fabulous for a 1 to 1 mapping; tree are better for a rich description of nodes that relate to each other. You would then use something like Dijkstra's Algorithm to navigate it.

Since a nested dict of dicts or dict of lists IS a graph, it is easy to come up with a recursive example:

def find_all_paths(graph, start, end, path=[]):
        path = path + [start]
        if start == end:
            return [path]
        if start not in graph:
            return []
        paths = []
        for node in graph[start]:
            if node not in path:
                newpaths = find_all_paths(graph, node, end, path)
                for newpath in newpaths:
                    paths.append(newpath)
        return paths       

def min_path(graph, start, end):
    paths=find_all_paths(graph,start,end)
    mt=10**99
    mpath=[]
    print '\tAll paths:',paths
    for path in paths:
        t=sum(graph[i][j] for i,j in zip(path,path[1::]))
        print '\t\tevaluating:',path, t
        if t<mt: 
            mt=t
            mpath=path

    e1=' '.join('{}->{}:{}'.format(i,j,graph[i][j]) for i,j in zip(mpath,mpath[1::]))
    e2=str(sum(graph[i][j] for i,j in zip(mpath,mpath[1::])))
    print 'Best path: '+e1+'   Total: '+e2+'\n'  

if __name__ == "__main__":
    graph = {'A': {'B':5, 'C':4},
             'B': {'C':3, 'D':10},
             'C': {'D':12},
             'D': {'C':5, 'E':9},
             'E': {'F':8},
             'F': {'C':7}}
    min_path(graph,'A','E')
    min_path(graph,'A','D')
    min_path(graph,'A','F')

Prints:

    All paths: [['A', 'C', 'D', 'E'], ['A', 'B', 'C', 'D', 'E'], ['A', 'B', 'D', 'E']]
        evaluating: ['A', 'C', 'D', 'E'] 25
        evaluating: ['A', 'B', 'C', 'D', 'E'] 29
        evaluating: ['A', 'B', 'D', 'E'] 24
Best path: A->B:5 B->D:10 D->E:9   Total: 24

    All paths: [['A', 'C', 'D'], ['A', 'B', 'C', 'D'], ['A', 'B', 'D']]
        evaluating: ['A', 'C', 'D'] 16
        evaluating: ['A', 'B', 'C', 'D'] 20
        evaluating: ['A', 'B', 'D'] 15
Best path: A->B:5 B->D:10   Total: 15

    All paths: [['A', 'C', 'D', 'E', 'F'], ['A', 'B', 'C', 'D', 'E', 'F'], ['A', 'B', 'D', 'E', 'F']]
        evaluating: ['A', 'C', 'D', 'E', 'F'] 33
        evaluating: ['A', 'B', 'C', 'D', 'E', 'F'] 37
        evaluating: ['A', 'B', 'D', 'E', 'F'] 32
Best path: A->B:5 B->D:10 D->E:9 E->F:8   Total: 32

I hadn't thought about it that way, it seems like a good method! I'll get started on it now. Thanks — user1440061, Mar 21 '13 at 02:00

Cartroo · Answer 2 · 2013-03-20T21:09:08.250

A dictionary is entirely suitable for looking up the value for a particular stop. This doesn't store any information about the order of the stops, however, so you'd likely need a separate list for that anyway. I'm assuming these times are the delay between adjacent stops only, so if you need to calculate the total time to get between arbitrary pairs of stops you might actually find a list of 2-tuples more convenient than a list and dictionary:

train_times = [("A", 0), ("B", 15), ("C", 12), ("D", 9)]

Note: the first time will always presumably be zero because there's no previous stop to time from. Alternatively you could make the final time zero and store the value against the previous stop, but I've assumed the first case here.

This allows you to calculate the total time from A to C quite simply:

def get_total_time(start_stop, end_stop):
    total_time = None
    for stop, this_time in train_times:
        if total_time is None and stop == start_stop:
            total_time = 0
        if total_time is not None:
            total_time += this_time
            if stop == end_stop:
                return total_time
    raise Exception("stop not found")

print "Time from A to C: %d" % (get_total_time("A", "C"),)

This could be made more efficient combining a list and a dictionary, but it doesn't make a lot of difference unless the list is very long (at least hundreds of stops).

Also, things get more complicated once you introduce lots of train lines which link up with each other. In this case you could use any number of data structures - one simple example would be a dictionary where the key is the stop and the value is a list of the adjacent stops on all lines and the times to them. However, finding a route through this requires some slightly non-trivial (but pretty well known) algorithms which are beyond the scope of this question.

In either case, if you need to look up these values quickly you could consider pre-calculating the time between all pairs of stops - this isknown as a the transitive closure of a graph (where "graph" in this sense is a network of nodes, not a line chart or similar).

score 1 · Answer 3 · answered Mar 20 '13 at 20:54

1

A dict is perfectly acceptable in this situation. In fact, if a database is unavailable to you and you're interested in reusing this dictionary in the future, you could pickle the object and use it in another script:

import pickle

output = open('my_pickled_dict', 'wb')
pickle.dumb(my_dict, output)
output.close

And then in your next script:

import pickle

my_pickled_file = open('my_pickled_dict', 'rb')
my_dict = pickle.load(my_pickled_file)

answered Mar 20 '13 at 20:54

That1Guy

7,075
4
47
59

-0. I imagine that the times between stops is only meaningful if you know what the *next* stop is. This implies that order is important, making a regular dictionary a poor solution. – Steven Rumbalski Mar 20 '13 at 21:12
Given the stops are 'named' sequentially, order is implied. In this situation I would fully support the use of a regular dictionary. If the order was not obvious, I would agree with you and recommend and ordered dict instead. – That1Guy Mar 20 '13 at 21:56

score 1 · Answer 4 · answered Mar 20 '13 at 20:55

A dictionary could be good for this, yes. But if you need to keep track of what stop comes next to the other, you might need something else, like an ordered dict, or a small class for a Stop that defines the next one (and whatever other data you'd like to keep about the stop), or even a list of stops, as it keeps the order.

Depending on what kind of access you need or what you want to do with this data, any of these could work for you.

Good luck.

score 0 · Answer 5 · answered Mar 20 '13 at 20:55

0

A dictionary is suitable for doing this.

It makes it easy for you to access the time (= value) for a given bus stop (= key).

time_per_stop = {"Stop A": 15, "Stop B": 12, "Stop C": 9}

Of course, in case you have a finite and small amount of bus stops, you could also just keep a list or tuple of stop times.

time_per_stop_list = [15, 12, 9]

answered Mar 20 '13 at 20:55

cfedermann

3,286
19
24

-0. I imagine that the times between stops is only meaningful if you know what the *next* stop is. This implies that order is important, making a regular dictionary a poor solution. – Steven Rumbalski Mar 20 '13 at 21:12

score 0 · Answer 6 · answered Mar 20 '13 at 20:55

0

A Dictionary is a set of values stored by the hashes of a set keys. The advantage of using a dictionary is the time needed to locate a specific record regardless of size is fixed (often called O(1)). Alternatively if you used a list, the time needed to locate a record would be equal to the time it takes to read each element (often called O(n) where n equals the number of elements in the list).

answered Mar 20 '13 at 20:55

Jason Sperske

29,816
8
73
124

1

To nitpick: You say *array*, but you really mean a Python *list*, right? A proper array would have O(1) lookup time for any given index. (Searching the array for a value is another story.) – kojiro Mar 20 '13 at 20:58

Dictionary best data structure for train routes?

6 Answers6

Linked

Related