I'm a student in Korea and I'm using python to analyze option data(finance). I'm finding a better way to speeding up the performance of my python code.
Target data is the transaction record(per minute) of the options and the period is from 2015 to 2019. Because the data is divided into 1227(the number of workdays during 5 years) files(txt), I tried to concatenate all 1227 files to minimize the number of accession to the memory. This is because I will use the result file(concatenated file = preprocessed file) repeatedly and accessing every separated file took too much time. Below is some part of my code.
#file_name is list type and it contains all names of the 1227 day files ordered by date
result_df = pd.DataFrame()
for f in file_name:
data_opt = pd.read_csv(location + f, header = None, sep = "\t")
#do something
#...
#...
oneday_df = pd.concat([minute_call, minute_put], axis = 0) #result of the processing one day data
result_df = pd.concat([result_df, oneday_df], axis = 0)
result_df.to_csv()
This code works and I could get the proper result. However, I could see that the speed slowed down as time goes by. It means that my code works fast when it processes early data but its speed slows down when it processes late data. Is there any better way to speeding up the performance of my python code?
(Sorry for my awkward English and thank you for reading all questions)