When I allocate large arrays with NumPy, it appears as though some kind of lazy allocation is taking place, which I do not understand.
If I do
a = np.empty(10**9)
while watching the memory usage of the system (e.g. via htop
), nothing happens. This allocates a billion 8-byte floats, so I would expect about 8 GB of additional memory being used up. Also, the operation only takes a few milliseconds.
If I now do
a[:] = 0
the memory jumps up to what is expected.
One may think that np.empty()
is somehow clever. The same behavior is however seen if I instead do
b = np.zeros(10**9)
Again, the memory does not seem to be allocated until I do e.g.
b[:] = 0
which ought to be a no-op. I can even loop over all elements without the memory going up.
Lastly, the same behavior is not seen with np.ones()
. Here the memory is consumed on creation, which now takes about a second.
What is going on?