How to concatenate for the right side in one file all the .csv files of a directory with python?

Question

I have a folder with .csv files all the files have the same ids but different contet, like this:

File one:

id, content
jdhfs_SDGSD_9403, bla bla bla bla
aadaaSDFDS__ASdas_asad_342, bla bla
...
asdkjASDAS_asdasSFSF_sdf, bla bla

File two:

id, content
jdhfs_SDGSD_9403, string string string
aadaaSDFDS__ASdas_asad_342, string string string
...
asdkjASDAS_asdasSFSF_sdf, string string string

I would like to leave the id column but merge in one new file the content, something like this(i.e. generate a new file):

id, content
jdhfs_SDGSD_9403, bla bla bla bla string string string
aadaaSDFDS__ASdas_asad_342, bla bla string string string
...
asdkjASDAS_asdasSFSF_sdf, bla bla string string string

This is what I tried:

from itertools import izip_longest
with open('path/file1.csv', 'w') as res, \
        open('/path/file1.csv') as f1,\
        open('path/file1.csv') as f2:
    for line1, line2 in izip_longest(f1, f2, fillvalue=""):
        res.write("{} {}".format(line1.rstrip(), line2))

The problem with this is that is merging everthing in one line. Any idea of how to do this in a more pythonic way?.

Edit:

import pandas as pd

df1= pd.read_csv('path/file1.csv')
df2=pd.read_csv('path/file2.csv')    

new_df = pd.concat([df1, df2], axis=1)
print new_df


new_df.to_csv('/path/new.csv')

Then the header was merged like this:

,id,content,id,content

And the content like this:

0jdhfs_SDGSD_9403, bla bla bla bla jdhfs_SDGSD_9403, string string string.

How can I get something like this?:

jdhfs_SDGSD_9403, bla bla bla bla string string string

Without the index number of the dataframe?.

score 1 · Answer 1 · edited May 23 '17 at 11:43

1

read the csvs's in using pd.read_csv(FILE)

Then do this:

import pandas as pd
pd.concat([df1, df2], axis=1)

Or merge them (pd.merge())

See this question:

Combine two Pandas dataframes with the same index

edited May 23 '17 at 11:43

Community

1
1

answered Mar 02 '15 at 19:33

Liam Foley

7,432
2
26
24

Thanks for the help. Do pandas can generate a new file?. – john doe Mar 02 '15 at 19:36
1

@johndoe YOURDF.to_csv('filename.csv') – Liam Foley Mar 02 '15 at 19:43
Thanks for the support, any idea of how to remove the dataframe's index number from the new file?. – john doe Mar 02 '15 at 22:14
1

@johndoe look into the reset_index and set_index methods. – Liam Foley Mar 02 '15 at 22:15
I tried this: `new_df = pd.concat([df1['content'], df2['content']], axis=1) new_df.reset_index(drop=True) new_df['content'].to_csv('path/new.csv') ` but the index number still in the new file thanks for the feedback! – john doe Mar 02 '15 at 22:25

horns · Answer 2 · 2015-03-02T23:03:59.823

1

Use the csv standard python module

i.e.

import csv

with open(filename1) as file1, open(filename2) as file2, open(newname, "w") as newfile:
    csv1 = csv.reader(file1)
    csv2 = csv.reader(file2)
    newcsv = csv.writer(newfile)

    header = next(csv1)
    next(csv2) # Skip the header

    newcsv.writerow(header)

    for row1, row2 in zip(csv1, csv2):
        id, content1 = row1
        id, content2 = row2
        newcsv.writerow((id, " ".join((content1, content2))))

edited Mar 02 '15 at 23:03

answered Mar 02 '15 at 20:20

horns

1,843
1
19
26

I got this:` newcsv.write(header) AttributeError: '_csv.writer' object has no attribute 'write' ` thanks for the help.` – john doe Mar 02 '15 at 22:02
@johndoe That should be writerow, sorry. – horns Mar 02 '15 at 23:03

How to concatenate for the right side in one file all the .csv files of a directory with python?

2 Answers2