2

I'm trying to understand the del operator in python. I'm merging two large data frames and trying to delete the participating data frames.

new_df = pd.merge(df1, df2, on='level', how='inner')
del df1
del df2

Is this going to release the memory allocated by df1 and df2 or just delete just the reference?

I have also tried using gc.collect however, I'm getting a memory error in the first operation after the merge

sytech
  • 29,298
  • 3
  • 45
  • 86
Niyas
  • 505
  • 1
  • 6
  • 17

1 Answers1

3

No, del doesn't have anything to do with memory management or garbage collection. It only decrements something called a reference counter - every object has one.

Once an object's reference counter drops to 0, that means it is no longer accessible through any variable, and will be garbage collected the next time the collector does its thing.

So, in summary, calling del on an object will eventually end up in its memory being reclaimed, as long as no other variable currently refers to that object.

Also see Is there a way to get the current ref count of an object in Python?

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Thanks. Then what is the best way to release memory allocated by df1 and df2? – Niyas Mar 06 '18 at 17:24
  • 1
    @niyas `del` will decrement the reference counter. If the count drops to 0 (i.e., no other variables pointing to the same objects), then the memory will be reclaimed eventually. To force garbage collection, use `import gc; gc.collect()`. – cs95 Mar 06 '18 at 17:27
  • @COLDSPEED I tried using gc, but it doesn't seem to work. I'm getting a memory error in the first operation after the merge. – Niyas Mar 06 '18 at 17:31
  • 3
    @niyas Ah, I see what your problem is. Unfortunately, you are barking up the wrong tree. The memory error occurs during the merge, when you need both your dataframes in memory to create the third one. Try looking at distributed dataframes with `dask` or spark, those are much better at handling big data. – cs95 Mar 06 '18 at 17:33