I know np.zeros() does not take up RAM at allocation time. But even if I tried to modify some entries of np.zeros(), I found that the memory usage, as measured by psutil.Process().memory_info().rss
, was still showing as zero., why? Shouldn't the memory usage increase at once like a page size when I accessed the array data? What happened inside?
Here is how I measure the memory usage of np.zeros():
>>> N = 10**8
>>> process = psutil.Process()
>>>
>>> start = process.memory_info()
>>> a = np.zeros(N, dtype=np.float64) # should take up around 8e8 bytes
>>> end = process.memory_info()
>>> print(f"""rss {end.rss-start.rss},
... vms {end.vms-start.vms:.2e},
... data {end.data-start.data:.2e}""")
Output
rss 0,
vms 8.00e+08,
data 8.00e+08
Initially, this numpy array a
took up 0 rss
because it called calloc()
(See this SO post for more detail about calloc()
) and did not touch the allocated memory. No problem.
However, when I wrote to the array, I expected the process triggered a page fault and began to take up some amount of rss
memory. I wrote the first n=10**3
entries, rss
did not change, still 0:
>>> n = 10**3
>>> a = np.zeros(N, dtype=np.float64)
>>> start = process.memory_info()
>>> for i in range(33790):
... a[i] = 1.
>>> end = process.memory_info()
>>> print(f"rss {end.rss-start.rss}")
rss 0
Until some point, the rss
began to change:
>>> n = 10**4 # use a bigger n now
>>> a = np.zeros(N, dtype=np.float64)
>>> start = process.memory_info()
>>> for i in range(33790):
... a[i] = 1.
>>> end = process.memory_info()
>>> print(f"rss {end.rss-start.rss}")
rss 536576
(I also noticed that the threshold of n
making rss
be non-zero changed when new variables being added to the current environment. Don't know why.)
Finally, when the whole np.zeros() was modified, the rss
got close to 8e8
(but not exactly):
>>> n = N # the whole array size
>>> a = np.zeros(N, dtype=np.float64)
>>> start = process.memory_info()
>>> for i in range(33790):
... a[i] = 1.
>>> end = process.memory_info()
>>> print(f"rss {end.rss-start.rss}")
rss 7.9992e+08
Shouldn't the memory usage rss
increase like a page size as soon as I accessed the array data?
Shouldn't the memory usage rss
be equal to or above 8e8
bytes when I modified the whole array?
Could anyone clarify on these things?
(Optional) I am confused about how memory usage measurements like rss
work, the right time to use it. Could anyone provide some good practices of measuring memory usage or references on it?
Thank you for sharing your knowledge.