8

I try to join two csv files where key is value of first column. There's no header.
Files have different number of lines and rows.
Order of file a must be preserved.

file a:

john,red,34
andrew,green,18
tonny,black,50
jack,yellow,27
phill,orange,45
kurt,blue,29
mike,pink,61

file b:

tonny,driver,new york
phill,scientist,boston

desired result:

john,red,34
andrew,green,18
tonny,black,50,driver,new york
jack,yellow,27
phill,orange,45,scientist,boston
kurt,blue,29
mike,pink,61

I examined all related threads and I am sure that some of you are gonna mark this question duplicate but I simply have not found solution yet.

I grabbed dictionary based solution but this approach does not handle preserve line order from file 'a' condition.

import csv
from collections import defaultdict

with open('a.csv') as f:
    r = csv.reader(f, delimiter=',')
    dict1 = {}
    for row in r:
        dict1.update({row[0]: row[1:]})

with open('b.csv') as f:
    r = csv.reader(f, delimiter=',')
    dict2 = {}
    for row in r:
        dict2.update({row[0]: row[1:]})

result = defaultdict(list)

for d in (dict1, dict2):
    for key, value in d.iteritems():
        result[key].append(value)

I also would like to avoid putting these csv files to the database like sqlite or using pandas module.

Thanks in advance

daikini
  • 1,307
  • 6
  • 23
  • 36

1 Answers1

6

Something like

import csv
from collections import OrderedDict

with open('b.csv', 'rb') as f:
    r = csv.reader(f)
    dict2 = {row[0]: row[1:] for row in r}

with open('a.csv', 'rb') as f:
    r = csv.reader(f)
    dict1 = OrderedDict((row[0], row[1:]) for row in r)

result = OrderedDict()
for d in (dict1, dict2):
    for key, value in d.iteritems():
        result.setdefault(key, []).extend(value)

with open('ab_combined.csv', 'wb') as f:
    w = csv.writer(f)
    for key, value in result.iteritems():
        w.writerow([key] + value)

produces

john,red,34
andrew,green,18
tonny,black,50,driver,new york
jack,yellow,27
phill,orange,45,scientist,boston
kurt,blue,29
mike,pink,61

(Note that I didn't bother protecting against the case where dict2 has a key which isn't in dict1-- that's easily added if you like.)

DSM
  • 342,061
  • 65
  • 592
  • 494