This is a variation of an earlier question I posted, Merging 1300 data frames into a single frame becomes really slow
This time I am trying to merge the frames column wise not below one another.
I have 1300 csv files in a directory.
Each file has a date in the first column, followed by daily data for the last 20-30 years which spans another 8 columns.
So like this, Data1.csv
Date source1 source2 source3 source4 source5 source6 source 7 source 8
I have 1300 uniquely named files each having uniquely named columns.
I am trying to merge all of these into one dataframe using pandas column-wise, not appending below but merging left and right like this
import pandas as pd
frame = pd.read_csv(file_path,index_col=0) #this is the location of the first file
for filename in os.listdir(filepath): #filepath has the rest of the files
file_path = os.path.join(filepath, filename)
df = pd.read_csv(file_path,index_col=0)
df = df.groupby(['Date']).first()
df = df.add_prefix(f"{filename}-")
frame = pd.merge(frame, df, how='outer', left_index=True, right_index=True)
length-=1
But around the 300th file my code really slows down...
I am merging these files column wise.
My goal is actually to have a massive dataframe which is 1+(1300x8) by number of dates for 30 years.
Is there a way to speed this up before my computer runs out of memory.