I am iterating over a python DataFrame
for index, row in df.iterrows():
...
if condition:
df.at[index, 'col'] = newValue
and every time df.at
executes, my memory increases a little.
I am iterating over a one million row DataFrame so the machine runs out of memory and crashes.
A few observations:
- If I do not update the df, the memory does not surge, so the memory leak is due to the update using the
at
method - I have tried breaking down the df in several chunks and calling
gc.collect()
after each chunk, or tryingdel df_chunk_i
thengc.collect()
but the memory is not released - I have seen some other somewhat related SO questions that propose to do
inplace=True
but I can't see that applicable here
Is there an alternative that prevents the memory leak?