1

I have several files, each of which has data like this (filename:data inside separated by newline):

  1. Mike: Plane\nCar
  2. Paula: Plane\nTrain\nBoat\nCar
  3. Bill: Boat\nTrain
  4. Scott: Car

How can I create a csv file using python that groups all the different vehicles and then puts a X on the applicable person, like:

output

mike
  • 1,319
  • 2
  • 11
  • 15

4 Answers4

1

Assuming those line numbers aren't in there (easy enough to fix if they are), and with an input file like following:

Mike: Plane
Car
Paula: Plane
Train
Boat
Car
Bill: Boat
Train
Scott: Car

Solution can be found here : https://gist.github.com/999481

import sys
from collections import defaultdict
import csv

# see http://stackoverflow.com/questions/6180609/group-and-check-mark-using-python
def main():
    # files = ["group.txt"]
    files = sys.argv[1:]
    if len(files) < 1:
        print "usage: ./python_checkmark.py file1 [file2 ... filen]"

    name_map = defaultdict(set)

    for f in files:
        file_handle = open(f, "r")
        process_file(file_handle, name_map)
        file_handle.close()

    print_csv(sys.stdout, name_map) 

def process_file(input_file, name_map):
    cur_name = ""
    for line in input_file:
        if ":" in line:
            cur_name, item = [x.strip() for x in line.split(":")]
        else:
            item = line.strip()
        name_map[cur_name].add(item)


def print_csv(output_file, name_map):
    names = name_map.keys()
    items = set([])
    for item_set in name_map.values():
        items = items.union(item_set)

    writer = csv.writer(output_file, quoting=csv.QUOTE_MINIMAL)
    writer.writerow( [""] + names )
    for item in sorted(items):
        row_contents = map(lambda name:"X" if item in name_map[name] else "", names)
        row = [item] + row_contents
        writer.writerow( row )


if __name__ == '__main__':
    main()

Output:

,Mike,Bill,Scott,Paula 
Boat,,X,,X 
Car,X,,X,X 
Plane,X,,,X 
Train,,X,,X 

Only thing this script doesn't do is keep the columns in order that the names are in. Could keep a separate list maintaining the order, since maps/dicts are inherently unordered.

I82Much
  • 26,901
  • 13
  • 88
  • 119
  • This works very well, the only thing is that file output generates a newline after each row. – mike May 30 '11 at 22:47
  • Um.. wouldn't you want each row to be on its own line? – I82Much May 30 '11 at 23:50
  • 1
    Actually, the problem is that I did not create the output csv file in binary, as per this [post](http://stackoverflow.com/questions/1170214/pythons-csv-writer-produces-wrong-line-terminator) – mike May 31 '11 at 13:04
  • Ah I see. Good to know - very unintuitive that you'd open the file in binary mode, considering it's text. – I82Much May 31 '11 at 14:43
0

Here is an example of how-to parse these kind of files.

Note that the dictionary is unordered here. You can use ordered dict (in case of Python 3.2 / 2.7) from standard library, find any available implmentation / backport in case if you have older Python versions or just save an order in additional list :)

data = {}
name = None

with open(file_path) as f:
    for line in f:
        if ':' in line: # we have a name here
            name, first_vehicle = line.split(':')
            data[name] = set([first_vehicle, ])  # a set of vehicles per name
        else:
            if name:
                data[name].add(line)

# now a dictionary with names/vehicles is available
# let's convert it to simple csv-formatted string..

# a set of all available vehicles
vehicles = set(v for vlist in data.values()
               for v in vlist)

for name in data:
    name_vehicles = data[name]
    csv_vehicles = ''
    for v in vehicles:
        if v in name_vehicles:
            csv_vehicles += v
        csv_vehicles += ','

    csv_line = name + ',' + csv_vehicles
Zaur Nasibov
  • 22,280
  • 12
  • 56
  • 83
0

Assuming that the input looks like this:

Mike: Plane
Car
Paula: Plane
Train
Boat
Car
Bill: Boat
Train
Scott: Car

This python script, places the vehicles in a dictionary, indexed by the person:

#!/usr/bin/python

persons={}
vehicles=set()

with open('input') as fd:
    for line in fd:
        line = line.strip()
        if ':' in line:
            tmp = line.split(':')
            p = tmp[0].strip()
            v = tmp[1].strip()
            persons[p]=[v]
            vehicles.add(v)
        else:
            persons[p].append(line)
            vehicles.add(line)

for k,v in persons.iteritems():
    print k,v

print 'vehicles', vehicles

Result:

Mike ['Plane', 'Car']
Bill ['Boat', 'Train']
Scott ['Car']
Paula ['Plane', 'Train', 'Boat', 'Car']
vehicles set(['Train', 'Car', 'Plane', 'Boat'])

Now, all the data needed are placed in data-structures. The csv-part is left as an exercise for the reader :-)

Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130
0

The most elegant and simple way would be like so:

vehiclesToPeople = {}
people = []

for root,dirs,files in os.walk('/path/to/folder/with/files'):
    for file in files:
        person = file
        people += [person]
        path = os.path.join(root, file)

        with open(path) as f:
            for vehicle in f:
                vehiclesToPeople.setdefault(vehicle,set()).add(person)

people.sort()
table = [ ['']+people ]
for vehicle,owners in peopleToVehicles.items():
    table.append([('X' if p in vehiclesToPeople[vehicle] else '') for p in people])

csv = '\n'.join(','.join(row) for row in table)

You can do pprint.pprint(table) as well to look at it.

ninjagecko
  • 88,546
  • 24
  • 137
  • 145