0

I am trying to remove a memory bottleneck in my program. Here is the interesting part:

print_mem_info()
print("creating array")
arr = np.empty(vol_to_write.get_shape(), dtype=np.float16)
for v_tmp, a_tmp in zip(v_list, a_list):
    s = to_basis(v_tmp, vol_to_write).get_slices()
    arr[s[0][0]:s[0][1],s[1][0]:s[1][1],s[2][0]:s[2][1]] = copy.deepcopy(a_tmp)
print_mem_info()
print("deleting array")
del arr
print_mem_info()

Here is the output:

Used RAM:  4217.71875 MB
creating array
Used RAM:  4229.68359375 MB
deleting array
Used RAM:  4229.2890625 MB

For print_mem_info I am just using the psutil library:

def print_mem_info():
    mem = psutil.virtual_memory()
    swap = psutil.swap_memory()
    used_ram = (mem.total - mem.available) /1024 /1024
    used_swap = swap.used /1024 /1024 
    print("Used RAM: ", used_ram, "MB")
    # print("Used swap: ", used_swap, "MB")

I am just creating a numpy array, filling it and then I want to delete it (in the program I am supposed to delete it later but for debugging purpose I am putting the del here). What I cannot understand is why the del is not removing the array from RAM, as there are not any other references to this array. I tried with gc.collect() and it did nothing.

I read a lot of other posts from stackoverflow but I could not figure it out. I know that gc.collect() is not supposed to be used and I read somewhere that using del is not recommended but I am manipulating very big numpy arrays so I cannot just let them in RAM.


[edit]:

I tried creating a minimal example here:

import numpy as np
import psutil, os

def print_mem_info():
    process = psutil.Process(os.getpid())
    print(process.memory_info().vms // 1024 // 1024)

if __name__ == "__main__":
    print("program starts")
    print_mem_info()

    print("creating samples...")
    a_list = list()
    for i in range(4):
        a_list.append(np.random.rand(100,100,100))
    print_mem_info()

    print("creating array...")
    arr = np.empty((400,100,100))
    print_mem_info()

    print("filling the array...")
    for i, a_tmp in enumerate(a_list):
        arr[i*100:(i+1)*100,:,:] = a_tmp
        del a_tmp
    print_mem_info()

    print("deleting the array...")
    del arr
    print_mem_info()
ava_punksmash
  • 357
  • 1
  • 4
  • 13
  • Unfortunately it is more complicated than that, it is an experiment that I am doing for a research thesis, and I am running several scripts one after the other, you need to configure file paths, clone some of my projects etc.... Sorry about that I wish I could – ava_punksmash Jul 16 '20 at 20:04
  • My bad it is "psutil" I put the code in the question (https://psutil.readthedocs.io/en/latest/) – ava_punksmash Jul 16 '20 at 20:08
  • 1
    First, `del` doesn't delete objects. `del arr` unbinds the `arr` variable. Second, freeing an object doesn't necessarily return memory to the OS. – user2357112 Jul 16 '20 at 20:10
  • There's the problem. In an empty program, it measures 11 GB used on my machine. Without any code except print_mem_info(). You are getting system wide information but you should look at program wide info – Thomas Weller Jul 16 '20 at 20:11
  • @ThomasWeller are you sure that you are not using any RAM at all ? That's weird... For me it seems to work, I mean if I don't run anything else psutil tells me the same than htop about my memory consumption. – ava_punksmash Jul 16 '20 at 20:15
  • Of course I have dozens of programs open, but they have nothing to do with numpy. – Thomas Weller Jul 16 '20 at 20:15
  • @user2357112supportsMonica I know about the first point, for the second point I thought that unbinding the array would actually return the data to the OS because for the beginning of the program it does it well. I don't know when it works and when it does not. For example if you open a python shell and create and delete the array right after you should see it removed from RAM with htop – ava_punksmash Jul 16 '20 at 20:17
  • Related: https://stackoverflow.com/questions/27574881/why-does-numpy-zeros-takes-up-little-space – Thomas Weller Jul 16 '20 at 20:19
  • Related: https://stackoverflow.com/questions/938733/total-memory-used-by-python-process – Thomas Weller Jul 16 '20 at 20:23
  • What OS are you using? In Linux, `del`'ing all references to a large array (10 MB+) will return the memory to the OS. If you have complicated dependencies (classes containing objects of other classes, views of the same array) it may take a `gc.collect` call to force them to be released. – Han-Kwang Nienhuys Jul 16 '20 at 20:25
  • I am using Fedora 30 and gc.collect() did nothing unfortunately – ava_punksmash Jul 16 '20 at 20:27
  • @ava_punksmash you cannot free **any object** in Python. Python uses automatic memory management, the CPython implementation in particular uses reference counting (and an auxilliary cyclic garbage collector that you are manipulating with the `gc` module). – juanpa.arrivillaga Jul 16 '20 at 20:52
  • I added a minimal example at the bottom of my question if it can help. I just hope this is really representative of my problem – ava_punksmash Jul 16 '20 at 21:09

1 Answers1

1

You are measuring the memory on system level, not on process level. You don't know what all other processes on your machine are doing.

Be careful with the example code for measuring memory of a process. Many examples there are mixing virtual memory and physical memory.

RSS (linux term) and Working Set (Windows term) are not good for discussing your problem, because they only consider that part of memory which is currently in physical RAM. Since that heavily depends on how much physical RAM you have, this will vary between machines and is absolutely not comparable.

VMS (linux term) or Private Bytes (Windows term) are much more reliable, since they also consider memory that is used, but swapped to disk if you don't have enough physical RAM.

The following code should help you get things started:

import numpy as np
import psutil
import os

def print_mem_info():
    process = psutil.Process(os.getpid())
    print(process.memory_info().vms // 1024 // 1024)

print_mem_info()
arr = np.empty((100000,100000))
print_mem_info()
del arr
print_mem_info()

On my machine, it prints

261
76705
262

The 76 GB sound plausible for 100.000 * 100.000 items in an array à 8 bytes.

With RSS, the effect is not visible:

47
47
47
Thomas Weller
  • 55,411
  • 20
  • 125
  • 222
  • Thank you for this information, it is very interesting, I will use this to monitor my program now. Unfortunately I still cannot see that my array is removed though (I think there is a problem with copying data from a_tmp) – ava_punksmash Jul 16 '20 at 20:53