I have a performance problem. The following code took 3 hours to loop through 5000 items out of 50 000.
I have a dataframe df
, and a list of dictionary keys to loop through key_list
. Each key corresponds to a single index of the dataframe. At each index, I want to get the mean of the columns mean_cols
a few rows before and a few rows after the index, and then create a new dictionary with the before and after columns.
mean_cols = ['A', 'B', 'C']
rows_list = []
key_list = list(some_dict.keys()) # around 50k items
for key in key_list:
means_after = df[mean_cols].iloc[key:key+5].mean()
means_before = df[mean_cols].iloc[key-5:key].mean()
for col in mean_cols:
row_dict[str(col+'_after')] = round(means_after[col], 2)
row_dict[str(col+'_before')] = round(means_before[col], 2)
rows_list.append(row_dict)
I am pretty sure its the two lines,
means_after = df[mean_cols].iloc[key:key+5].mean()
means_before = df[mean_cols].iloc[key-5:key].mean()
however I can't think of a faster way to do it. Anyone have any ideas?