0

I know np.zeros() does not take up RAM at allocation time. But even if I tried to modify some entries of np.zeros(), I found that the memory usage, as measured by psutil.Process().memory_info().rss, was still showing as zero., why? Shouldn't the memory usage increase at once like a page size when I accessed the array data? What happened inside?

Here is how I measure the memory usage of np.zeros():

>>> N = 10**8
>>> process = psutil.Process()
>>> 
>>> start = process.memory_info()
>>> a = np.zeros(N, dtype=np.float64)  # should take up around 8e8 bytes
>>> end = process.memory_info()
>>> print(f"""rss {end.rss-start.rss},
... vms {end.vms-start.vms:.2e}, 
... data {end.data-start.data:.2e}""")

Output

rss 0,
vms 8.00e+08, 
data 8.00e+08

Initially, this numpy array a took up 0 rss because it called calloc() (See this SO post for more detail about calloc()) and did not touch the allocated memory. No problem.

However, when I wrote to the array, I expected the process triggered a page fault and began to take up some amount of rss memory. I wrote the first n=10**3 entries, rss did not change, still 0:

>>> n = 10**3
>>> a = np.zeros(N, dtype=np.float64)
>>> start = process.memory_info()
>>> for i in range(33790):
...     a[i] = 1.
>>> end = process.memory_info()
>>> print(f"rss {end.rss-start.rss}")
rss 0

Until some point, the rss began to change:

>>> n = 10**4  # use a bigger n now
>>> a = np.zeros(N, dtype=np.float64)
>>> start = process.memory_info()
>>> for i in range(33790):
...     a[i] = 1.
>>> end = process.memory_info()
>>> print(f"rss {end.rss-start.rss}")
rss 536576

(I also noticed that the threshold of n making rss be non-zero changed when new variables being added to the current environment. Don't know why.)

Finally, when the whole np.zeros() was modified, the rss got close to 8e8 (but not exactly):

>>> n = N  # the whole array size
>>> a = np.zeros(N, dtype=np.float64)
>>> start = process.memory_info()
>>> for i in range(33790):
...     a[i] = 1.
>>> end = process.memory_info()
>>> print(f"rss {end.rss-start.rss}")
rss 7.9992e+08

Shouldn't the memory usage rss increase like a page size as soon as I accessed the array data?

Shouldn't the memory usage rss be equal to or above 8e8 bytes when I modified the whole array?

Could anyone clarify on these things?

(Optional) I am confused about how memory usage measurements like rss work, the right time to use it. Could anyone provide some good practices of measuring memory usage or references on it?

Thank you for sharing your knowledge.

JonasW
  • 1
  • 2

0 Answers0