4

I have a program that uses and creates very large numpy arrays. However, garbage collection doesn't seem to be releasing the memory of these arrays.

Take the following script as an example, where it says big is not tracked (aka is_tracked is False).

big = np.ones((100, 100, 100))
print(gc.is_tracked(big))
big2=big+1

get_referrers etc is redundant in this case returning empty lists, since it's not tracked.

memory_profiler confirms this with the following output. Showing that neither del (as expected) or gc.collect() are freeing memory.

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     5    212.3 MiB    212.3 MiB           1   @mprof
     6                                         def main():
     7    212.3 MiB      0.0 MiB           1       size = 100
     8    242.8 MiB     30.5 MiB           1       big = np.ones((size, size, size))
     9    242.8 MiB      0.0 MiB           1       print(gc.is_tracked(big))
    10    273.3 MiB     30.5 MiB           1       big2=big+1
    11    273.3 MiB      0.0 MiB           1       del big
    12    273.3 MiB      0.0 MiB           1       gc.collect()
    13    273.3 MiB      0.0 MiB           1       big2

From this thread it implies that they should be tracked and collected. Anyone know why these arrays aren't being tracked for my configuration and how to make sure they are / manually free up the memory?

I'm on MacOs, using the latest versions of numpy-1.22.1 and python3.9.

Many thanks in advance.

ti7
  • 16,375
  • 6
  • 40
  • 68
Suzie Q.
  • 41
  • 1
  • Yeah unfortunately, if you look at the memory_profiler code you'll see that on lines 11 & 12. – Suzie Q. Jan 26 '22 at 16:10
  • ah sorry I didnt read well your post; usually it should do the trick – nicofrlo Jan 26 '22 at 16:24
  • does it change if you have a bigger object? or you also delete `big2`? – ti7 Jan 26 '22 at 16:25
  • Unfortunately not to those either. I don't have access to another machine to see if numpy objects are is_tracked more generally. But using https://www.online-python.com it replicates this problem. – Suzie Q. Jan 26 '22 at 16:38
  • 1
    What's `mprof`? `from memory_profiler import profile as mprof`? – wjandrea Jan 26 '22 at 18:20
  • FWIW, I'm no expert, but I can't seem to reproduce the issue. `del big` increments -7.6 MiB (after `big =` incremented 7.7 MiB). `gc.is_tracked(big)` returns `False`, but that might be normal, IDK. Are you using IPython? Are you using an alternate implementation like PyPy? I'm using CPython 3.8 Anaconda on Ubuntu 20.04. (Assuming `from memory_profiler import profile as mprof`.) – wjandrea Jan 26 '22 at 18:32
  • AFAIK this is dependent of the default allocator behaviour of you libc (the standard C library that Numpy uses) as well as your operating system. Some allocators are conservative since not releasing memory to the OS is often faster. However, this should not be the case for big arrays like >=1 GiB (eg. 128x1000x1000 float64-based array). If so, then it is likely a bug and certainly not a bug in Numpy. `gc.is_tracked` return `False` on my Windows with Numpy 1.20.3, but it does release the allocated memory. I think reproducing the problem in a C program should help to check this hypothesis. – Jérôme Richard Jan 26 '22 at 18:43
  • 1
    `gc.is_tracked` is irrelevant: that concerns only potential **cycles** among Python objects, which is obviously impossible for a Numpy array that contains no user-selected Python references. – Davis Herring Jan 26 '22 at 19:33
  • @wjandrea yes that's exactly what mprof is! Thanks Jerome and David, super helpful. Maybe something to do with the M1 architecture. Good to know it's machine specific, I'll try spinning it up on a linux server and see. – Suzie Q. Feb 02 '22 at 11:38

0 Answers0