2

I have a CSV file that goes something like this:

['Name1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+']
['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', '']
['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

Now, I need a way to join all of the rows that have the same 1st column name into one column, for instance:

['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '+']
['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', '']
['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

I can think of a way to do this by sorting the CSV and then going trough each row and column and compare each value, but there should probably be an easier way to do it.

Any ideas?

jbssm
  • 6,861
  • 13
  • 54
  • 81

3 Answers3

3

You should use itertools.groupby:

t = [ 
['Name1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+'],
['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', ''],
['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', ''],
['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] 
]

from itertools import groupby

# TODO: if you need to speed things up you can use operator.itemgetter
# for both sorting and grouping
for name, rows in groupby(sorted(t), lambda x:x[0]):
    print join_rows(rows)

It's obvious that you'd implement the merging in a separate function. For example like this:

def join_rows(rows):
    def join_tuple(tup):
        for x in tup:
            if x: 
                return x
        else:
            return ''
    return [join_tuple(x) for x in zip(*rows)]
moooeeeep
  • 31,622
  • 22
  • 98
  • 187
1
def merge_rows(row1, row2):
    # merge two rows with the same name
    merged_row = ...
    return merged_row

r1 = ['Name1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+']
r2 = ['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
r3 = ['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', '']
r4 = ['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
rows = [r1, r2, r3, r4]
data = {}
for row in rows:
    name = row[0]
    if name in data:
        data[name] = merge_rows(row, data[name])
    else:
        data[name] = row

You now have all the rows in data where each key of this dictionary is the name and the corresponding value is that row. You can now write this data to a CSV file.

Simeon Visser
  • 118,920
  • 18
  • 185
  • 180
  • Hi and thanks Simeon: I don't understand what is going on in the merged_row part. Where is the previous row(or rows) with the same name stored so that I can merge them? – jbssm Jun 14 '12 at 11:38
  • The current row that you're processing is `row` and the the other is `data[name]`. The row in `data[name]` is either a previous row with that name or the result of one or more merges of rows with that name. So you only need to write the code that specifies how to merge two rows with the same name. If you write that code for `merged_row` then it'll repeatedly merge rows (even if there are three or more rows with the same name). – Simeon Visser Jun 14 '12 at 11:42
  • I have updated the code to make it a bit cleared. All you need to do is write `merge_rows` to specify how two rows with the same name need to be merged. – Simeon Visser Jun 14 '12 at 11:49
0

You can also use defaultdict:

>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> _ = [d[i[0]].append(z) for i in t for z in i[1:]]
>>> d['Name1']
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

Then do your column joining

Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284