I am trying to merge 30K csvs in a directory with same headers and I want to merge them to one file. with the below code I can only merge but with same headers and I do not want to repeat the headers after where the new files is added.
import pandas as pd
f = r'path/*.csv
combined_csv = pd.concat([ pd.read_csv(f) for f in filenames ])
combined_csv.to_csv('output.csv', index=False, header=True)
Error:
Traceback (most recent call last):
File "merg_csv.py", line 4, in <module>
combined_csv = pd.concat([ pd.read_csv(f) for f in filenames ])
NameError: name 'filenames' is not defined
Edit: The Solution provided in the answer below works but after sometime the memory is used and the program freezes and also freezes my screen.
import glob
import pandas as pd
all_data = pd.dataFrame()
dfs = []
for f in glob.glob("*.csv"):
df = pd.read_csv(f, error_bad_lines=False)
dfs.append(df)
all_data = pd.concat(dfs, ignore_index=True)
all_data.to_csv("00_final.csv", index=None, header=True)
How can I merge and write into the output file at same time so that I will not face the low memory error. The size of the inputs is about 1.5gb and the number of files are more than 60K
Thank in advance !!