I am currently modifying a pandas dataframe in a loop structure which looks something like this:
for item in item_list:
~~~~ do something to the item ~~~~~
results_df = results_df.append(item)
This code is fine for small items being appended and whenever the results_df is small. However, the items I am appending are reasonably large, and the loop is quite long, which means this loop takes quite a long time to complete due to the large expense of copying the result_df when it becomes large.
One solution I can see is that I could append items to a list in this dictionary, like:
results_dict = {'result_1': [], 'result_2': [], 'result_3': []}
for item in item_list:
item_1, item_2, item_3 = item
~~~~~ do something ~~~~
results_dict['result_1'].append(item_1)
results_dict['result_2'].append(item_2)
results_dict['result_3'].append(item_3)
From the resulting dictionary the dataframe can then be made. This is ok but does not seem optimal. Can anyone think of a better solution? Nb the items in each item in item_list are reasonably large dataframe on which some comoplex processing takes place, and the length of item_list is of the order of 1000