1

I have a csv file that looks like this:

original csv:

vehicle,status,count 
car1,used,10 
car2,free,20 
car1,free,3 
car3,used,30 
car3,free,10

I am trying to group the 'vehicles per their status and then create a file like this: final output:

vehicle,free,used
car1,3,10
car2,20,0
car3,10,30

I am was looking at python collections module, which i think can be used for a task like this. But not sure how to proceed Any pointers will be much appreciated.

i have the following code so far:

import collections
import csv

out=[]

with open('file.csv', "rb") as f:
        reader = csv.DictReader(f)
        for row in reader:
                veh = row["vehicle"]

                out[veh] = out.get(veh, collections.defaultdict(int))
sergey
  • 143
  • 1
  • 2
  • 10
  • Questions asking for help must include a summary of the work you've done so far to solve the problem, and a description of the difficulty you are having solving it. Please include a snippet of code of what you've tried. – Davide Alberani Jun 22 '17 at 15:58
  • You may also have a look at pandas. I suspect this module has convenient data reorganization functions. – bli Jun 22 '17 at 16:52
  • any other way i can solve this without pandas? – sergey Jun 22 '17 at 16:58

1 Answers1

0

Using Pandas:

You can do something like that using pandas:

>>> import pandas as pd
>>> cars = pd.read_csv('/tmp/cars.csv')
>>> cars2 = pd.pivot_table(cars, index=["vehicle"], columns=["status"]).fillna(0)
>>> cars2.columns = ["free", "used"]
>>> print(cars2.astype(int).to_csv())
vehicle,free,used
car1,3,10
car2,20,0
car3,10,30

The car2 x used category doesn't exist in the original data, so fillna is used to replace the missing value with a 0.

The pivoting operation created a MultiIndex, hence the replacement with something simpler.

The astype(int) is here because by default, the values are handled as floats.

Using a defauldict

Another way of doing, storing the counts in a defaultdict of pairs of ints:

#!/usr/bin/env python3

from collections import defaultdict

with open("cars.csv", "r") as cars_file:
    header = cars_file.readline()
    cars_counter = defaultdict(lambda : [0, 0])
    for line in cars_file:
        veh, status, count = line.strip().split(",")
        if status == "free":
            cars_counter[veh][0] += int(count)
        elif status == "used":
            cars_counter[veh][1] += int(count)
        else:
            raise ValueError("""Unknown status "{status}".""".format(status=status))

print("vehicle,free,used")
for car, (free, used) in cars_counter.items():
    print(",".join((car, str(free), str(used))))

The output order is not guaranteed for python versions before 3.6 (see Dictionaries are ordered in Python 3.6+), and is not guaranteed in the future, so you may want to sort by keys if the line order is important.

bli
  • 7,549
  • 7
  • 48
  • 94