I am working with a simple numpy.dtype
array and I am using numpy.savez
and numpy.load
methods to save the array to a disk and read it from the disk. During both storing and loading of the array, the memory usage as shown by 'top' doesn't appear to be what it should be like. Below is a sample code that demonstrates this.
import sys
import numpy as np
import time
RouteEntryNP = np.dtype([('final', 'u1'), ('prefix_len', 'u1'),
('output_idx', '>u4'), ('children', 'O')])
a = np.zeros(1000000, RouteEntryNP)
time.sleep(10)
print(sys.getsizeof(a))
with open('test.np.npz', 'wb+') as f:
np.savez(f, a=a)
while True:
time.sleep(10)
Program starts with memory usage of 25M - somewhat closer to intuition - the actual size of members of RouteEntryNP is 14 bytes - so 25M is somewhat closer to intuition. But as the data is being written to the file - the memory usage shoots up to approx 250M.
A similar behavior is observed when loading the file, in this case the memory usage shoots up to approximately 160M and explicit gc.collect() doesn't seem to help as well. The way I am reading the file is as follows.
import numpy as np
np.load('test.np.npz')
import gc
gc.collect()
The memory usage stays @ 160M. Not sure why this is happening. Is there a way to 'reclaim' this memory?