0

I am iterating over a python DataFrame

for index, row in df.iterrows():
    ...
    if condition:
        df.at[index, 'col'] = newValue

and every time df.at executes, my memory increases a little.

I am iterating over a one million row DataFrame so the machine runs out of memory and crashes.

A few observations:

  • If I do not update the df, the memory does not surge, so the memory leak is due to the update using the at method
  • I have tried breaking down the df in several chunks and calling gc.collect() after each chunk, or trying del df_chunk_i then gc.collect() but the memory is not released
  • I have seen some other somewhat related SO questions that propose to do inplace=True but I can't see that applicable here

Is there an alternative that prevents the memory leak?

user
  • 1
  • Does this answer your question? [Memory leak using pandas dataframe](https://stackoverflow.com/questions/14224068/memory-leak-using-pandas-dataframe) – RoseGod Jan 06 '22 at 08:56
  • 1
    Thank you for the reference RoseGod. gc.collect() seems to "fix" that issue (according to the github thread) but it does not fix the above issue. – user Jan 06 '22 at 11:22
  • If you make a minimal reproducible example, it will be easier to answer your question. – Tim Boddy Jan 06 '22 at 20:05
  • Tim, thank you for the suggestion which actually solved my problem... I tried to reproduce the problem by creating a dataframe of random variables and the issue did not happen. In my original problem i was updating my dataframe by adding percentages as strings, so after converting everything to floats instead of strings the memory leak disapeared... So by dealing with floats instead of strings I do not have that problem anymore – user Jan 07 '22 at 22:10

1 Answers1

0

The memory leak only appeared when using a string type in the dataframe, when using floats the memory leak disapeared.

I was initially adding strings to the dataframe and those strings were percentages as strings, e.g. 17% instead of 0.17. after moving to floats it all worked as expected

user
  • 1