0

Context: I generated a networkx graph with different transports stop stations. The only attributes each stop stations has is their id, name, lon and lat positions.

I want to add other attributes to each points, these attributes are found in 3 csv files that I opened as dicts: (I simplified them quite a lot for easier reading):

stops_csv = DictReader(open(STOPS_FILE,'r'))
Dict2 = dict()
for stop in stops_csv:
    Dict2[stop['stop_id']] = stop


Dict2:   ### Dict gotten from the nx graph.
{'stop1': OrderedDict([('stop_id', 'stop1'),
              ('stop_name', 'name1'),
              ('lat', 'lat1'),
              ('lon', 'lon1')]),
 'stop2': OrderedDict([('stop_id', 'stop2'),
              ('stop_name', 'name2'),
              ('lat', 'lat2'),
              ('lon', 'lon2')]), ...}

Dict1:   ### Dict that links Dict2 and Dict3.
{'stop1': OrderedDict([('trip_id', 'trip1'),
              ('t1', '01:43:00'),
              ('t2', '01:43:00')]),
 'stop2': OrderedDict([('trip_id', 'trip2'),
              ('t1', '18:14:00'),
              ('t2', '18:14:00')]), ...}

Dict3:   ### Dict containing trip_id and route_id.
{'trip1': OrderedDict([('route_id', 'route1'),
              ('trip_id', 'trip1'),
              ('direction_id', '0')]),
 'trip2': OrderedDict([('route_id', 'route2'),
              ('trip_id', 'trip2'),
              ('direction_id', '0')]), ...}

I would like to link Dict1, Dict2 and Dict3 in one single multi-leveled dict that I plan to use in a nx.set_node_attributes() afterward.

For each stop_id of Dict2, I would like to add every trip_id corresponding that are in Dict3. And then, for each trip_id previously added, I would like to add every route_id corresponding that also are in Dict3.

My issues are the followings:

  • I can't seem to accumulate values that have the same key instead of replacing them. I tried what was proposed in this post but couldn't seem to make it work. So I tried another approach and bellow is what I did so far. Basically, for each stop_id there is one or more trip_id corresponding, however, I only get the very last trip_id value.
test_dict = dict()

for s in Dict2: # 's' stands for stop.
    test_dict['{}'.format(s)] = {}
    for t in Dict3: # 't' stands for trip.
        test_dict['{}'.format(s)]['trip_id'] = t
print(test_dict)

>>> {'stop1': {'trip_id': 'tripn'},  #'tripn' corresponds to the last trip_id value.
 'stop2': {'trip_id': 'tripn'},
 'stop3': {'trip_id': 'tripn'},
 'stop4': {'trip_id': 'tripn'},
 'stop5': {'trip_id': 'tripn'}, ...}
  • Also, one of the biggest issue I have is route_id not being a key but a value of Dict3 and I have no idea how I am supposed to go about that. So any indications would be greatly appreciated here...

The outcome should look like this:


{stop1
     trip1
          route1
     trip2
          route1

stop2
     trip3
          route1
     trip4
          route1
     trip5
          route2
...}

I know it doesn't seem logical to have trip_id before route_id but I won't work with it as much as trip_id so this outcome should make my future work easier in theory.

I have looked at many posts about creating nested dictionaries with python and especially this one that goes into multi-level dict but I still couldn't find a solution to my problem, so here I am.

I could always open the 3 csv as dataframes, merge them and then make the desired dict out of them but I don't know how to go about that either.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
I.M.
  • 344
  • 3
  • 14

1 Answers1

0

I am not sure if you want to merge all information from the dictionaries, or just the stop-trip-route names like you specified. For the latter, here is some simple code that creates a dictionary with

  stop
    trip
      route

structure:

# initialise new dictionary
new_dict = {}

for stop in Dict2.keys():

    # access the "connection dict" and get the trip_id
    trip_ids = Dict1.get(stop).get('trip_id')

    # initialise trip dict
    trip_dict = {}

    # if there is only one trip_id, create a list with a single entry
    if not isinstance(trip_ids, list):
        trip_ids = [trip_ids]

    for trip_id in trip_ids:

        # using trip id, get route info:
        route_id = Dict3.get(trip_id).get('route_id')

        # combine information
        trip_dict[trip_id] = route_id

    new_dict[stop] = trip_dict

if a given stop_id has more than one trip_id, the new_dict will look like this:

new_dict = {
       'stop_01': {
            'trip1': 'route1',
            'trip2': 'route2' 
                  }
            }

you can verify this by accessing the keys:

new_dict['stop_01'].keys()
warped
  • 8,947
  • 3
  • 22
  • 49
  • Thank you for your answer, ideally I would like to merge all informations but I'll start with the stop-trip-route combo and work from here. What you posted gives me only one trip_id-route_id combinaison for each stop_id but how can I list all the trip_is-route_id when there are more than one for a stop_id? (which is almost always the case). – I.M. Nov 08 '19 at 08:34
  • could you update the question s.t. it better reflects the data? There, you only have one trip-route combinations. – warped Nov 08 '19 at 12:01
  • I could try but I showed in the outcome part that every ```stop``` could have more than one ```trip``` related to it (but each ```trip``` only has one ```route``` related to it). – I.M. Nov 08 '19 at 14:06