0

I am very new to this python programming. I have two csv files. I have to just merge them using the common column name. I have been trying it by looking at several other posts. But couldnt get that code running in my 2.5 version of python. So Could anyone help me regarding this. The files may look like this

File1
split_name, vcc, temp, c
A, 1,2,1
B,2,3,5

File 2
split_name, cout, i, vout
A, 2.5,2, 1
B, 2.4,1,8

Result file should be something like this

split_name,vcc,temp,c,cout,i,vout
A,         1,   2,  1, 2.5,2,1
B,         2,   3,  5, 2.4,1,8

The code that I was trying is :

import csv
import array
import os

#def readfile2(file2name):
r = csv.reader(open('file1.csv','r'))
dict2 = {row[0]: row[1:] for row in r}
print str(dict2)
#print dict2.keys()

#def readfile1(file1name):
reader1 = csv.reader(open('file2.csv','r'))
for row in reader1:
    dict1 = {row[0]: row[1:]}
    #print str(dict1)
    #print dict1.values()
    print str(dict1)



keys = set(dict1.keys() + dict2.keys())
with open('output.csv', 'wb') as f:
    w = csv.writer(f, delimiter=',')
    w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])

But the error that I have encountered is:

keys = set((dict1.keys()) + (dict2.keys())) TypeError: unsupported operand type(s) for +: 'dict_keys' and 'dict_keys'

Note: I have installed python 3.4 version now.

Your help will be greatly appreciated!

rami92
  • 11
  • 3
  • 2
    Please provide the code that you have so far and tell us what exactly is going wrong. It will clarify thing for people answering your question. – Olga Jun 21 '15 at 20:07
  • @Olga I have edited the question with the code I was trying. Could pls help me with that? – rami92 Jun 22 '15 at 17:02

2 Answers2

1

First of all it's better to stick with particular version of python.

You get this error TypeError: unsupported operand type(s) for +: 'dict_keys' and 'dict_keys' because Python 3 return dictionary view for key() method unlike Python 2.5.

You wan to get union of keys from two dicts. You can say it to python this way:

set(dict1.keys()) | set(dict2.keys())

where | operator is union for two sets.

To solve your task should rewrite last line so writerows() got list of strings as argument, not list of lists. I think it would be better for you to use for cycle instead of list comprehension.

Here is the code for Python 3 with some changes and comments:

import csv

reader1 = csv.reader(open('file1.csv','r'))
titles1 = next(reader1)  # First row is titles. So we parse it separetly
dict1 = {row[0]: row[1:] for row in reader1}

reader2 = csv.reader(open('file2.csv','r'))
titles2 = next(reader2)
dict2 = {}  # If we skipt this and do nto change behaviour in cilce we
            # will get only last row every time
for row in reader2:
    dict2[row[0]] = row[1:]

keys = set(dict1.keys()) | set(dict2.keys())
with open('output.csv', 'w', newline='') as f:  # For CVS it's beeter to use
                                                # text mode, not binary.
    w = csv.writer(f, delimiter=',')

    w.writerow(titles1 + titles2)
    for key in keys:
        w.writerow([key, ] +
                dict1.get(key, [''] * (len(titles1)-1)) +
                dict2.get(key, [''] * (len(titles2)-1))
                )
Lehych
  • 166
  • 1
  • 12
0

You can do this most easily using the join function from pandas. If you cannot install pandas, you can reimplement the csv-loading and joining functionality in pure python, but I think in the long run you're better off with pandas.

import pandas
df1 = pandas.read_table('file1.csv')
df2 = pandas.read_table('file2.csv')
joined = df1.join(df2)

You can play around with the parameters to read_table and join to get exactly the behavior you want. Assuming split_name is a unique identifier for each row in both files, you will probably want to use it as the "index" for both of the dataframes.

cxrodgers
  • 4,317
  • 2
  • 23
  • 29
  • I'm not sure how easy it would be to get pandas running in Python 2.5. – DSM Jun 22 '15 at 16:59
  • @cxrodgers I have tried using pandas. But I got error that there is no module name pandas. Could you let me know any other way to do this? – rami92 Jun 22 '15 at 17:04
  • @cxrodgers I have now installed 3.4 version. But couldnt get that done using pandas. – rami92 Jun 22 '15 at 17:06
  • @DSM Yes. I am not able to get pandas in 3.4 version too. I have seen ur reply for the post http://stackoverflow.com/questions/20414562/python-joining-csv-files-where-key-is-first-column-value . I am struck with the same code at iteritems. It is throwing an error which says that OrderedDict object has no attribute iteritems. Can you please help me in fixing that? – rami92 Jun 22 '15 at 18:00
  • Sorry, I missed that you were using Python 2.5. If you upgrade to a recent version of python, you should be able to install pandas using pip ("pip install pandas"). – cxrodgers Jun 25 '15 at 02:14