I have 1300 csv files in a directory.
Each file has a date in the first column, followed by daily data for the last 20-30 years which spans another 8 columns.
So like this, Data1.csv
Date source1 source2 source3 source4 source5 source6 source 7 source 8
I have 1300 uniquely named files.
I am trying to merge all of these into one dataframe using pandas like this
import pandas as pd
frame = pd.DataFrame()
length = len(os.listdir(filepath))
for filename in os.listdir(filepath):
file_path = os.path.join(filepath, filename)
print(length,end=" ")
df = pd.read_csv(file_path,index_col=0)
df = pd.concat([df[[col]].assign(Source=f'{filename[:-4]}-{col}').rename(columns={col: 'Data'}) for col in df])
frame = frame.append(df)
length-=1
But around the 300th file I have around 12 million rows and my code really slows down...
Is there a way to speed this up before my computer runs out of memory.
My goal is actually to have a massive dataframe which is 1+ (1300x8) by number of dates for 30 years.