1

Can I merge multiple csv files that have the same header in one csv using python? I would like to keep the same column headers but if i merge my files using the terminal it creates a file with repeating headers; so there's a way to do it in Python?

all.m
  • 63
  • 1
  • 6
  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Irem Sep 17 '22 at 09:57

2 Answers2

2

i hope you find this helpful

import pandas as pd

# this two dataframes are just for showcasing the merge
# But can be replaced but any other valid DataFrame
# even more than 2 dataframes, just need to repeat this process

df1 = pd.DataFrame()
df1['col1'] = 1,2,3
# it even supports dataframes with not exactly the same columns
df1['col3'] = 4,4,8

df2 = pd.DataFrame()
df2['col1'] = 4,5,6
df2['col2'] = 1,4,8

# here df3 gets the resulting merged DataFrame
df3 = df1.append(df2)

# As you can see they get merged
# but as the df1 didnt had the "col2" its values are NaN
# Same with df2 and "col3"
>>   test1  col3  col2
0      1   1.0   NaN
1      2   4.0   NaN
2      3   8.0   NaN
0      4   NaN   4.0
1      5   NaN   4.0
2      6   NaN   8.0

# dumps dataframes to "file.csv" in the current folder
df3.to_csv("file.csv")
2

Quickest way is to use csvkit in particular csvstack.

cat csv_1.csv                                                                                                                                                              
id,col1,col2,col3
1,'test','dog','cat'
2,'foo','fish','rabbit'

cat csv_2.csv                                                                                                                                                              
id,col1,col2,col3
3,'bar','owl','crow'
4,'spam','eel','cow'

# To stdout
csvstack csv_1.csv csv_2.csv

id,col1,col2,col3
1,'test','dog','cat'
2,'foo','fish','rabbit'
3,'bar','owl','crow'
4,'spam','eel','cow'



# To new file.
csvstack csv_1.csv csv_2.csv > csv_1_2.csv

Using the global library csv.

import csv

header = None
new_file = []
for f in ('csv_1.csv', 'csv_2.csv'):
    with open(f, newline='') as csv_file:
        reader = csv.reader(csv_file)
        if not header:
            new_file.append(next(reader))
            header = True
        else:
            next(reader)
        for row in reader:
            new_file.append(row)

with open('csv_new_file.csv', 'w', newline='') as csv_out:
    writer = csv.writer(csv_out)
    writer.writerows(new_file)


cat csv_new_file.csv

id,col1,col2,col3
1,'test','dog','cat'
2,'foo','fish','rabbit'
3,'bar','owl','crow'
4,'spam','eel','cow'
Adrian Klaver
  • 15,886
  • 2
  • 17
  • 28
  • Why reader.__next__() and not just next(reader)? I’m pretty sure the latter is Pythonic and recommend, and the former is discouraged. And, I’m glad to see non-programming suggestions like csvkit! Also, writer.writerows(new_file) might be cleaner, if there’s nothing to do per-row while writing. – Zach Young Sep 18 '22 at 01:28
  • @ZachYoung. Updated with suggestions, thanks. To me the point of SO is to solve problems, if the best solution is an existing program then I see no issue in suggesting it. – Adrian Klaver Sep 18 '22 at 16:04