1

I am working with a simple numpy.dtype array and I am using numpy.savez and numpy.load methods to save the array to a disk and read it from the disk. During both storing and loading of the array, the memory usage as shown by 'top' doesn't appear to be what it should be like. Below is a sample code that demonstrates this.

import sys
import numpy as np
import time

RouteEntryNP = np.dtype([('final', 'u1'), ('prefix_len', 'u1'),
                        ('output_idx', '>u4'), ('children', 'O')])

a = np.zeros(1000000, RouteEntryNP)

time.sleep(10)

print(sys.getsizeof(a))
with open('test.np.npz', 'wb+') as f:
    np.savez(f, a=a)

while True:
    time.sleep(10)

Program starts with memory usage of 25M - somewhat closer to intuition - the actual size of members of RouteEntryNP is 14 bytes - so 25M is somewhat closer to intuition. But as the data is being written to the file - the memory usage shoots up to approx 250M.

A similar behavior is observed when loading the file, in this case the memory usage shoots up to approximately 160M and explicit gc.collect() doesn't seem to help as well. The way I am reading the file is as follows.

import numpy as np
np.load('test.np.npz')
import gc
gc.collect()

The memory usage stays @ 160M. Not sure why this is happening. Is there a way to 'reclaim' this memory?

gabhijit
  • 3,345
  • 2
  • 23
  • 36
  • 2
    Python won't release the memory back to the OS immediately, to reduce churn; `top` therefore won't tell you how much of the memory held by the Python process is actually *"in use"*. If you want to understand what's going on in your program better, use a Python-specific memory profiler. Related: http://stackoverflow.com/q/15455048/3001761 – jonrsharpe Oct 12 '15 at 14:11
  • Thanks @jonrsharpe . Well I tried to use `heapy`, which was not of much use. It clearly wasn't able to count the memory usage. Something that's able to dig deeper into actual allocations would be of use. Any pointers? I also tried using `massif` tool of `valgrind` but not much luck. – gabhijit Oct 12 '15 at 15:53
  • See http://stackoverflow.com/q/110259/3001761 – jonrsharpe Oct 12 '15 at 15:57
  • 1
    Try saving something more general than `np.zeros`. That has some sort of 'lazy memory allocation'. – hpaulj Oct 12 '15 at 16:23

0 Answers0