Group and Check-mark using Python

Question

I have several files, each of which has data like this (filename:data inside separated by newline):

Mike: Plane\nCar
Paula: Plane\nTrain\nBoat\nCar
Bill: Boat\nTrain
Scott: Car

How can I create a csv file using python that groups all the different vehicles and then puts a X on the applicable person, like:

output

Are the line numbers also in your file? – Sven Marnach May 30 '11 at 20:45 — Sven Marnach, May 30 '11 at 20:45
No, that's just to show that there are separate files. – mike May 30 '11 at 21:39 — mike, May 30 '11 at 21:39

score 1 · Accepted Answer · answered May 30 '11 at 21:15

Assuming those line numbers aren't in there (easy enough to fix if they are), and with an input file like following:

Mike: Plane
Car
Paula: Plane
Train
Boat
Car
Bill: Boat
Train
Scott: Car

Solution can be found here : https://gist.github.com/999481

import sys
from collections import defaultdict
import csv

# see http://stackoverflow.com/questions/6180609/group-and-check-mark-using-python
def main():
    # files = ["group.txt"]
    files = sys.argv[1:]
    if len(files) < 1:
        print "usage: ./python_checkmark.py file1 [file2 ... filen]"

    name_map = defaultdict(set)

    for f in files:
        file_handle = open(f, "r")
        process_file(file_handle, name_map)
        file_handle.close()

    print_csv(sys.stdout, name_map) 

def process_file(input_file, name_map):
    cur_name = ""
    for line in input_file:
        if ":" in line:
            cur_name, item = [x.strip() for x in line.split(":")]
        else:
            item = line.strip()
        name_map[cur_name].add(item)


def print_csv(output_file, name_map):
    names = name_map.keys()
    items = set([])
    for item_set in name_map.values():
        items = items.union(item_set)

    writer = csv.writer(output_file, quoting=csv.QUOTE_MINIMAL)
    writer.writerow( [""] + names )
    for item in sorted(items):
        row_contents = map(lambda name:"X" if item in name_map[name] else "", names)
        row = [item] + row_contents
        writer.writerow( row )


if __name__ == '__main__':
    main()

Output:

,Mike,Bill,Scott,Paula 
Boat,,X,,X 
Car,X,,X,X 
Plane,X,,,X 
Train,,X,,X

Only thing this script doesn't do is keep the columns in order that the names are in. Could keep a separate list maintaining the order, since maps/dicts are inherently unordered.

This works very well, the only thing is that file output generates a newline after each row. — mike, May 30 '11 at 22:47
Actually, the problem is that I did not create the output csv file in binary, as per this [post](http://stackoverflow.com/questions/1170214/pythons-csv-writer-produces-wrong-line-terminator) — mike, May 31 '11 at 13:04
Ah I see. Good to know - very unintuitive that you'd open the file in binary mode, considering it's text. — I82Much, May 31 '11 at 14:43

Zaur Nasibov · Answer 2 · 2011-05-30T21:19:01.040

Here is an example of how-to parse these kind of files.

Note that the dictionary is unordered here. You can use ordered dict (in case of Python 3.2 / 2.7) from standard library, find any available implmentation / backport in case if you have older Python versions or just save an order in additional list :)

data = {}
name = None

with open(file_path) as f:
    for line in f:
        if ':' in line: # we have a name here
            name, first_vehicle = line.split(':')
            data[name] = set([first_vehicle, ])  # a set of vehicles per name
        else:
            if name:
                data[name].add(line)

# now a dictionary with names/vehicles is available
# let's convert it to simple csv-formatted string..

# a set of all available vehicles
vehicles = set(v for vlist in data.values()
               for v in vlist)

for name in data:
    name_vehicles = data[name]
    csv_vehicles = ''
    for v in vehicles:
        if v in name_vehicles:
            csv_vehicles += v
        csv_vehicles += ','

    csv_line = name + ',' + csv_vehicles

Fredrik Pihl · Answer 3 · 2011-05-30T21:20:34.787

Assuming that the input looks like this:

Mike: Plane
Car
Paula: Plane
Train
Boat
Car
Bill: Boat
Train
Scott: Car

This python script, places the vehicles in a dictionary, indexed by the person:

#!/usr/bin/python

persons={}
vehicles=set()

with open('input') as fd:
    for line in fd:
        line = line.strip()
        if ':' in line:
            tmp = line.split(':')
            p = tmp[0].strip()
            v = tmp[1].strip()
            persons[p]=[v]
            vehicles.add(v)
        else:
            persons[p].append(line)
            vehicles.add(line)

for k,v in persons.iteritems():
    print k,v

print 'vehicles', vehicles

Result:

Mike ['Plane', 'Car']
Bill ['Boat', 'Train']
Scott ['Car']
Paula ['Plane', 'Train', 'Boat', 'Car']
vehicles set(['Train', 'Car', 'Plane', 'Boat'])

Now, all the data needed are placed in data-structures. The csv-part is left as an exercise for the reader :-)

ninjagecko · Answer 4 · 2011-05-30T21:40:48.363

The most elegant and simple way would be like so:

vehiclesToPeople = {}
people = []

for root,dirs,files in os.walk('/path/to/folder/with/files'):
    for file in files:
        person = file
        people += [person]
        path = os.path.join(root, file)

        with open(path) as f:
            for vehicle in f:
                vehiclesToPeople.setdefault(vehicle,set()).add(person)

people.sort()
table = [ ['']+people ]
for vehicle,owners in peopleToVehicles.items():
    table.append([('X' if p in vehiclesToPeople[vehicle] else '') for p in people])

csv = '\n'.join(','.join(row) for row in table)

You can do pprint.pprint(table) as well to look at it.

Group and Check-mark using Python

4 Answers4