Python: minimizing memory usage with functions

Question

I am writing a code where at some point I need to solve several generalized eigenvalue problems for large sparse matrices. Because these operations are essentially similar (only the name of the considered matrices are changing), I made a function:

def eig_prob(myvariables):
  # this is just a simplified example
  name = 'iteration_'+myvariables["i"]
  A = myvariables["A"]
  B = myvariables["B"]
  N = myvariables["nb_eig"]
  Z,V = eigsh(A,N,B,sigma = 1)
  # save in Matlab format
  scipy.io.savemat(files["exec"]+name+".mat",{"Z":Z,"V":V})

As I do not return any argument to my main function, I would expect the quantity of RAM memory to be the same before and after the call to eig_prob.

In fact, I observe that the consumption of RAM memory increased by about 800 Mb during the call to eig_prob, which is expected, and this memory is not freed after the call, which seems surprising to me.

Is there any explanation for such behavior? Can it be avoided? Do I need to run my function as a sub process to avoid this over consumption of memory?

edit: a colleague of mine indicated that gs.collect() [1] may help, it does! When called after the function, gs.collect() frees the 800 Mb.

[1] https://docs.python.org/2/library/gc.html

You can try to force memory release with gc.collect(). See http://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-memory-in-python. — Bérenger, Jan 16 '15 at 15:20
This will not work in all cases under CPython (though you may have better luck with PyPy et al.), because CPython never relocates objects (`id()` is required to always return the same value throughout an object's lifespan). A block of memory cannot be released so long as it holds at least one live object. — Kevin, Jan 16 '15 at 15:25

score 0 · Answer 1 · answered Jan 16 '15 at 14:58

If a Python object is allocated, it happens to be put onto the heap of the program.

If it is a quite large object, memory will be allocated via mmap() for as long as it is needed and freed again afterwards. I am not sure if that happens immediately...

For smaller objects, the brk() boundary of the process will be shifted. In this case, memory is allocated. If some other objects are added afterwards and the former objects are freed, their memory is free on the heap, but cannot be returned to the OS. Only after the end-most object on the heap is freed, part of the free area can be returnd to the OS.

You talk about 800 MB, which is clearly so large that the mmap() method should be used, but if the data consists of thousands of smaller objects, chances are that they land on the brk() heap.

it seems that the only way to free the memory in my case is to kill the process... — Alain, Jan 16 '15 at 15:00

Python: minimizing memory usage with functions

1 Answers1