70

I have lot of dataframes created as part of preprocessing. Since I have limited 6GB ram, I want to delete all the unnecessary dataframes from RAM to avoid running out of memory when running GRIDSEARCHCV in scikit-learn.

1) Is there a function to list only, all the dataframes currently loaded in memory?

I tried dir() but it gives lot of other object other than dataframes.

2) I created a list of dataframes to delete

del_df=[Gender_dummies,
 capsule_trans,
 col,
 concat_df_list,
 coup_CAPSULE_dummies]

& ran

for i in del_df:
    del (i)

But its not deleting the dataframes. But deleting dataframes individially like below is deleting dataframe from memory.

del Gender_dummies
del col
GeorgeOfTheRF
  • 8,244
  • 23
  • 57
  • 80
  • 2
    I noticed that there's no accepted answer for this question yet. I've found the answer [here](https://stackoverflow.com/a/39101287/6329945) to be particularly useful, at least in my personal experience. In essence, not even gc.collect() can ensure that you get your RAM back, but running your intermediate dataframes in a different process will ensure that the resources taken by the process are given back when your process ends. The link also has tips on how to reduce memory usage by Pandas, in general. – tunawolf Apr 02 '19 at 10:08

5 Answers5

92

del statement does not delete an instance, it merely deletes a name.

When you do del i, you are deleting just the name i - but the instance is still bound to some other name, so it won't be Garbage-Collected.

If you want to release memory, your dataframes has to be Garbage-Collected, i.e. delete all references to them.

If you created your dateframes dynamically to list, then removing that list will trigger Garbage Collection.

>>> lst = [pd.DataFrame(), pd.DataFrame(), pd.DataFrame()]
>>> del lst     # memory is released

If you created some variables, you have to delete them all.

>>> a, b, c = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
>>> lst = [a, b, c]
>>> del a, b, c # dfs still in list
>>> del lst     # memory release now
Community
  • 1
  • 1
pacholik
  • 8,607
  • 9
  • 43
  • 55
  • K.How to release memory in python? – GeorgeOfTheRF Aug 27 '15 at 11:46
  • K. Why does "del Gender_dummies work" but when i try to delete dataframes in a loop its not working? for i in del_df: del (i) – GeorgeOfTheRF Aug 29 '15 at 13:26
  • 9
    So, is this solution saying that in order to delete a number of dataframes, we have to first put them in a list and then delete the list? This sounds so inefficient. Not sure if I understood this correctly. – Saeed Feb 28 '18 at 15:03
  • 11
    @Saeed No. In order to delete a number of dataframes *that are also in list*, you have to `del` the list too. – pacholik Feb 28 '18 at 19:03
  • 1
    @pacholik So, if the dataframe aren't in a list, then just ```del``` of that dataframe works? – kjsr7 Nov 16 '20 at 05:16
  • 1
    @JayaKommuru Yes, exactly. – pacholik Nov 16 '20 at 09:28
  • How about if i created a dataframe using `df = pd.concat([df1, df2], axis=0)`. Using `del df1, df2` also works? – Murilo Feb 10 '23 at 12:30
  • 1
    @Murilo `concat` copies data. So this would delete your original dataframes, but the new one would remain intact. – pacholik Feb 13 '23 at 10:09
25

In python automatic garbage collection deallocates the variable (pandas DataFrame are also just another object in terms of python). There are different garbage collection strategies that can be tweaked (requires significant learning).

You can manually trigger the garbage collection using

import gc
gc.collect()

But frequent calls to garbage collection is discouraged as it is a costly operation and may affect performance.

Reference

shanmuga
  • 4,329
  • 2
  • 21
  • 35
  • 2
    Thanks for this! Automatic garbage collection after `del df` doesn't seem to happen if I've done `df.iterrows()`, but `gc.collect()` seems to have the desired effect. – Nathan Lloyd Nov 17 '16 at 23:28
  • 1
    Awesomre, very usefull especially you work with large Pandas dataframes that could drain all your memory. – mcrrnz Mar 27 '19 at 16:17
22

This will delete the dataframe and will release the RAM/memory

del [[df_1,df_2]]
gc.collect()
df_1=pd.DataFrame()
df_2=pd.DataFrame()

the data-frame will be explicitly set to null

in the above statements

Firstly, the self reference of the dataframe is deleted meaning the dataframe is no longer available to python there after all the references of the dataframe is collected by garbage collector (gc.collect()) and then explicitly set all the references to empty dataframe.

more on the working of garbage collector is well explained in https://stackify.com/python-garbage-collection/

hardi
  • 739
  • 6
  • 8
  • 22
    Welcome to Stack Overflow! While this code snippet may solve the question, [including an explanation](//meta.stackexchange.com/questions/114762/explaining-entirely-code-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. Please also try not to crowd your code with explanatory comments, as this reduces the readability of both the code and the explanations! – Blue Mar 02 '18 at 02:52
1

I am using intermediate dataframes in a notebook, and what you can easily do is just simply write:

df = [] 

All the previous columns and rows within it are now gone.
The fact it is there is really minimal at that point.

WildcatGeo
  • 11
  • 2
0

df1 = pd.DataFrame()

#to delete df1 and released memory

del df1