2

When I allocate large arrays with NumPy, it appears as though some kind of lazy allocation is taking place, which I do not understand.

If I do

a = np.empty(10**9)

while watching the memory usage of the system (e.g. via htop), nothing happens. This allocates a billion 8-byte floats, so I would expect about 8 GB of additional memory being used up. Also, the operation only takes a few milliseconds. If I now do

a[:] = 0

the memory jumps up to what is expected.

One may think that np.empty() is somehow clever. The same behavior is however seen if I instead do

b = np.zeros(10**9)

Again, the memory does not seem to be allocated until I do e.g.

b[:] = 0

which ought to be a no-op. I can even loop over all elements without the memory going up.

Lastly, the same behavior is not seen with np.ones(). Here the memory is consumed on creation, which now takes about a second.

What is going on?

jmd_dk
  • 12,125
  • 9
  • 63
  • 94
  • 3
    Does this answer your question: https://stackoverflow.com/questions/44487786/performance-of-zeros-function-in-numpy ? – cadolphs Jan 04 '22 at 17:39
  • If you're on a Unix-like system, your array is mmap'ed (with MAP_PRIVATE) onto /dev/zero. As long as you only read this memory, you get zeros. When you write it, the OS makes a copy of the page (however big a page is on your OS) and puts it into your address space. You should notice that your memory usage doesn't change until you do an actual write. (This is the same as @Lagerbaer's answers, but a little bit more internal details.) – Frank Yellin Jan 04 '22 at 17:45
  • @FrankYellin Interesting. I can still write to single elements, `a[i] = 1`, without the memory going up. Even if I write to many different `i`. How is this possible? – jmd_dk Jan 04 '22 at 18:18
  • 1
    Pages are copied and added to your address space as needed. `a[i] = 1` will cause one page to be copied, not the entire array. If you tried `import mmap; a[::mmap.PAGESIZE]=1`, you would be modifying one byte on every page and would be forcing the entire array into your address space. – Frank Yellin Jan 04 '22 at 19:41

0 Answers0