14

I have 50 pickle files that are 0.5 GB each. Each pickle file is comprised of a list of custom class objects. I have no trouble loading the files individually using the following function:

def loadPickle(fp):
    with open(fp, 'rb') as fh:
        listOfObj = pickle.load(fh)
    return listOfObj

However, when I try to iteratively load the files I get a memory leak.

l = ['filepath1', 'filepath2', 'filepath3', 'filepath4']
for fp in l:
    x = loadPickle(fp)
    print( 'loaded {0}'.format(fp) )

My memory overflows before loaded filepath2 is printed. How can I write code that guarantees that only a single pickle is loaded during each iteration?

Answers to related questions on SO suggest using objects defined in the weakref module or explicit garbage collection using the gc module, but I am having a difficult time understanding how I would apply these methods to my particular use case. This is because I have an insufficient understanding of how referencing works under the hood.

Related: Python garbage collection

Community
  • 1
  • 1
Lionel Brooks
  • 258
  • 2
  • 7

1 Answers1

9

You can fix that by adding x = None right after for fp in l:.

The reason this works is because it will dereferenciate variable x, hance allowing the python garbage collector to free some virtual memory before calling loadPickle() the second time.

Ionut Hulub
  • 6,180
  • 5
  • 26
  • 55
  • 1
    This isn't actually guaranteed to work. Python is _allowed_ to dispose of the object once you stop referencing it, but it isn't _required_ to. Fortunately, the CPython implementation _will_ do so, as long as you don't have any circular references anywhere, but code that depends on something that's explicitly documented to not be guaranteed should at the very least come with comments and warnings… – abarnert Apr 29 '13 at 22:15
  • 7
    Wouldn't it be better to do `del x`? Just to be sure that variable `x` will be dereferenciated. – juliomalegria Apr 29 '13 at 22:30
  • 1
    I don't think that makes a difference, but that's a valid alternative. – Ionut Hulub Apr 29 '13 at 22:48
  • I have situation when it does not work - memory used still accumulates. – Zbyszek Jan 22 '19 at 16:47
  • @Zbyszek: with the `del` or the `=None` alternative? :) – Roelant Mar 19 '19 at 13:19
  • @Roelant del shouldn't be needed but tried both back then - I tried to pickle big amount of small images data from opencv - finally I just saved those in png format and packed into zip instead of trying to pickle them in batches – Zbyszek Mar 19 '19 at 20:57
  • This is so weird. The `del` instruction didn't work but adding a `= None` statement did it on my side. – LucG Apr 07 '20 at 18:05