0

I am trying to write a python script that combines entries in a csv file.

a, 1, A
a, 2, B
b, 3, C
b, 4, D

to

a, 1, A, 2, B
b, 3, C, 4, D

The image below shows a more complete picture of my csv file.

image

I've tried using the csv and pandas libraries, but I'm lost. I am new to coding and this is way over my head right now.

EDIT: It will not always be 2 rows for each letter, it can vary and if that's the case, each one will add onto the row, for example it might be

a, 1, A, 2, B, 3, C, 4, D

b, 5, E, 6, F

c, 7, G, 8, H, 9, J, 10, K, 11, L, 12, M –

Seth
  • 11
  • 1
  • Here's a hint: Create a `data` dictionary to store all data. The key for this dict will be the first element of each row. The value will be a list containing all the elements you want in the final result. Read the csv file, and iterate over each row. The first element of each row tells you the "key". If this key doesn't exist in `data`, set it to the current row. If it does exist in `data`, append the other elements of the current row to the preexisting value. When you're done, the values of this dictionary give you the rows of your new csv file – Pranav Hosangadi Jul 12 '21 at 17:17
  • You've shown the value when the group of the first letter has two rows; what if it doesn't? – Neil Jul 12 '21 at 17:21
  • 1
    @Neil, It will not always be 2 rows for each letter, it can vary and if that's the case, each one will add onto the row, for example it might be a, 1, A, 2, B, 3, C, 4, D b, 5, E, 6, F c, 7, G, 8, H, 9, J, 10, K, 11, L, 12, M – Seth Jul 12 '21 at 17:26
  • @Seth -- I recommend adding that explanation to the question by editing it. Any clarifying details should be included in the question so those trying to answer will not need to go through the comment thread. – Ben Y Jul 12 '21 at 17:36

2 Answers2

1

Simple enough with a defaultdict

from io import StringIO
import csv
import sys
from collections import defaultdict

data = '''\
a, 1, A
a, 2, B
b, 3, C
b, 4, D
b, 5, E
c, 6, F
'''

out = defaultdict(list)

f = StringIO(data)
reader = csv.reader(f)

for (x, y, z) in reader:
    out[x].extend([y, z])

writer = csv.writer(sys.stdout)
for k, v in out.items():
    writer.writerow([k] + v)

Outputs

a, 1, A, 2, B
b, 3, C, 4, D, 5, E
c, 6, F
0

You can use itertools.groupby in conjunction with operator.itemgetter and itertools.chain for this:

The main thing here is groupby that groups by the first item, then itemgetter is a more sophisticated lambda x: x[n] and chain just combines multiple iterables for us.

from itertools import groupby, chain
from operator import itemgetter

data = [['a', 1, 'A'],
        ['a', 2, 'B'],
        ['b', 3, 'C'],
        ['b', 4, 'D']]
        
output = [[k, *chain(*map(itemgetter(1,2), g))] for k, g in groupby(data, itemgetter(0))]
print(output)

[['a', 1, 'A', 2, 'B'], ['b', 3, 'C', 4, 'D']]

Note: Make sure the rows are sorted by the first key otherwise this won't work as intended. You would likely want to use collections.defaultdict for it instead if that is the case

Jab
  • 26,853
  • 21
  • 75
  • 114