I'm working with a 400.000 rows dataframe (actually, is bigger, but for tests purposes I'm using this dimension).
I need to export to txt/csv multiple files based on two conditions: #RIC and Date.
Looping around these conditions becomes a really slow process, so I'm looking for some faster way to do this.
That's my original idea:
def SaveTxt(df, output_folder=None):
# Start time
start_time = time.time()
# Data Frame with date
df['Date'] = pd.to_datetime(df['Date-Time']).dt.date
dates = df['Date'].unique()
ticks = df['#RIC'].unique()
for tick in ticks:
for date in dates:
# print(date, tick)
# Filtering by instrument and date
temp_df = df[(df['#RIC'] == tick) & (df['Date'] == date)]
if temp_df.empty:
pass
else:
# Saving files
if output_folder in [None, ""]:
temp_df.to_csv("%s_%s.txt" % (date, tick))
else:
temp_df.to_csv("%s\\%s_%s.txt" % (output_folder, date, tick))
# Elapsed time
elapsed_time = time.time() - start_time
elapsed_time = time.strftime("%H:%M:%S", time.gmtime(elapsed_time))
# Priting elapsed time
print('Elapsed time: %s' % elapsed_time)
For 400.000 rows (equivalent of 5 days data) it takes 3 minutes to run this script. For one year, takes 6 hours and I didn't tried with 10 years, but I suppose that is not a good idea.
Solution Idea
I've tried to remove the data used in each loop from df, but this condition is not working (maybe this will remove the size of the data frame and will turn the code faster):
df = df[(df['#RIC'] != tick) & (df['Date'] != date)]
I believe this should remove every tick AND date from the data frame, but it's applying this condition separably.
I'll appreciate if you guys have some solution for this problem.
Thanks
Edit
Don't know if this is the best way to share a sample of the data (I can't upload under a proxy)
#RIC Date Price Volume DIJF21 16/10/2019 4.64 15 DIJF21 16/10/2019 4.64 40 DIJF21 16/10/2019 4.64 100 DIJF21 16/10/2019 4.64 5 DIJF21 16/10/2019 4.64 1765 DIJF21 16/10/2019 4.64 10 DIJF21 16/10/2019 4.64 100 DIJF21 16/10/2019 4.64 1000 DIJF21 16/10/2019 4.64 5 DIJF21 16/10/2019 4.64 20 DIJF21 16/10/2019 4.64 80 DIJF21 16/10/2019 4.64 25 DIJF21 16/10/2019 4.64 25 DIJF21 16/10/2019 4.64 150 DIJF20 15/10/2019 4.905 2000 DIJF20 15/10/2019 4.905 2000 DIJF20 15/10/2019 4.903 10