1

I have data in the format

from        to
Location1   Location2
Location1   Location3
Location1   Location4
Location1   Location5

Location2   Location1
Location2   Location3

Location3   Location1
Location3   Location2
Location3   Location4

In a csv file. The data maps bike trips from one station to another and is taken from the website of a bike rental company in Chicago.

Right now I have basic code that takes each row and adds it to a list, but it does not create a dictionary in the second index as I am wishing for. My script looks like:

import csv
li = []
with open('Desktop/test_Q4_trips.csv', newline='') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for name, imports in reader:
    li.append({
        "name": name,
        "imports": imports,
    })
del li[0]

Here is the output,

[{"from": "Location1", "to": "Location2"}, {"from": "Location1", "to": "Location3"},
{"from": "Location1", "to": "Location4"}, {"from": "Location1", "to": "Location5"}, 
...]

I want to convert this data into a this format,

[{"from": "Location1", "to": ["Location2", "Location3", "Location4", "Location5"]},
    {"from": "Location2", "to": ["Location1", "Location3"]},
    {"from": "Location3", "to": ["Location1", "Location2", "Location4"]}, ...
].

In words, I would like to create a list of dictionaries where each dictionary has a single value in the first index and a list of (variably many) values in the second index. In particular, the output should list all of the stations that are on the receiving end of a bike rental trip in a list in the second index. To do this, I imagine I will have to create a script with a for loop that loops through the "from" values on the left and appends every "to" location that corresponds to each "from" one to a list.

I would like my data to be in the particular form I am mentioning in order to work with data visualization code that I have. I am sure that creating the format that I would like needs a leap in thought, but I am not sure exactly what to do to satisfy this. I am also not sure if the output type I need should be a list or an array, and would appreciate a clarification about this.

Please help me solve this, Thanks in advance.

Dacre Denny
  • 29,664
  • 5
  • 45
  • 65
Ashok B. Raife
  • 93
  • 3
  • 11

2 Answers2

2

The collections.defaultdict might be a good approach to solve this problem.

from collections import defaultdict


d = defaultdict(list)

a = [{"from": "Location1", "to": "Location2"}, {"from": "Location1", "to": "Location3"},
     {"from": "Location1", "to": "Location4"}, {"from": "Location1", "to": "Location5"}]


for o in a:
    d[o['from']].append(o['to'])

print(d)
leotrubach
  • 1,509
  • 12
  • 15
  • This gives the output `[{"Location1": ["Location2", "Location3", "Location4", "Location5"]}]` not `[{"from": "Location1", "to": ["Location2", "Location3", "Location4", "Location5"]}]`. Any idea how I would go about adding the "from" and "to" into the dictionary? Thanks. – Ashok B. Raife Aug 24 '18 at 19:46
  • 1
    Oh, sorry, just add following line: `res = [{"from": k, "to": v} for k, v in d.items()]` to the end, @AshokB.Raife – leotrubach Aug 25 '18 at 08:48
  • Great. One last question: how would I remove any duplicate output, like if a trip route is listed twice in the data but I want there to be only one record in the "to" column of the output file? – Ashok B. Raife Aug 25 '18 at 13:39
  • 1
    In that case it is better to use set instead of list. So you use `d = defaultdict(set)` instead of `list` and `d[o['from']].add(o['to'])` instead of `append()` – leotrubach Aug 25 '18 at 16:57
0

This should work I think

import numpy as np
l = [{"from": "Location1", "to": "Location2"}, {"from": "Location1", "to": "Location3"},
 {"from": "Location1", "to": "Location4"}, {"from": "Location1", "to": "Location5"}]

from_to = np.array(([d['from'] for d in l],[d['to'] for d in l])).T
froms = set(from_to[:,0])

out = []
for f in froms: 
    d = {}
    mask = from_to[:,0]==f
    d['from']=f
    d['to'] = from_to[:,1][mask]
    out.append(d)
kevinkayaks
  • 2,636
  • 1
  • 14
  • 30