I'm trying to create efficient broadcast arrays in numpy, e.g. a set of shape=[1000,1000,1000]
arrays that have only 1000 elements, but repeated 1e6 times. This can be achieved both through np.lib.stride_tricks.as_strided
and np.broadcast_arrays
.
However, I am having trouble verifying that there is no duplication in memory, and this is critical since tests that actually duplicate the arrays in memory tend to crash my machine leaving no traceback.
I've tried examining the size of the arrays using .nbytes
, but that doesn't seem to correspond to the actual memory usage:
>>> import numpy as np
>>> import resource
>>> initial_memuse = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> pagesize = resource.getpagesize()
>>>
>>> x = np.arange(1000)
>>> memuse_x = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of x = {0} MB".format(x.nbytes/1e6))
Size of x = 0.008 MB
>>> print("Memory used = {0} MB".format((memuse_x-initial_memuse)*resource.getpagesize()/1e6))
Memory used = 150.994944 MB
>>>
>>> y = np.lib.stride_tricks.as_strided(x, [1000,10,10], strides=x.strides + (0, 0))
>>> memuse_y = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of y = {0} MB".format(y.nbytes/1e6))
Size of y = 0.8 MB
>>> print("Memory used = {0} MB".format((memuse_y-memuse_x)*resource.getpagesize()/1e6))
Memory used = 201.326592 MB
>>>
>>> z = np.lib.stride_tricks.as_strided(x, [1000,100,100], strides=x.strides + (0, 0))
>>> memuse_z = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of z = {0} MB".format(z.nbytes/1e6))
Size of z = 80.0 MB
>>> print("Memory used = {0} MB".format((memuse_z-memuse_y)*resource.getpagesize()/1e6))
Memory used = 0.0 MB
So .nbytes
reports the "theoretical" size of the array, but apparently not the actual size. The resource
checking is a little awkward, as it looks like there are some things being loaded & cached (perhaps?) that result in the first striding taking up some amount of memory, but future strides take none.
tl;dr: How do you determine the actual size of a numpy array or array view in memory?