8

I am using numpy version 1.14.3 and python 2.7.12.

Referencing this question, I am finding dramatically different speeds between initializing arrays with np.zeros and np.empty. However, the output is the same.

import numpy as np
r = np.random.random((50, 100, 100))
z = np.zeros(r.shape)
e = np.empty(r.shape)
np.allclose(e, z)

This returns True. However, the timing functions %timeit gives very different results:

%timeit z = np.zeros(r.shape)

10000 loops, best of 3: 143 µs per loop

%timeit e = np.empty(r.shape)

1000000 loops, best of 3: 1.83 µs per loop

The previously accepted answer referenced above says that np.zeros was always the better choice, and that it is faster too.

Why not use np.empty when it is 80 times faster than np.zeros and returns the same answer?

Edit As user2285236 pointed out, flipping the order of initializing z and e will break the equality, because it overwrites on the same memory area.

Mike
  • 1,727
  • 3
  • 15
  • 25
  • 2
    There is no guarantee that `np.empty` will return an array full of zeros. – user2357112 Sep 10 '18 at 16:44
  • Also, if you take a closer look at the question you linked, you'll find that the questioner understood this and manually zeroed the `np.empty` return value. – user2357112 Sep 10 '18 at 16:46
  • Yes, I understand, and in previous versions of numpy it was necessary to zero out the array. Now, it appears to return zeros by default. Has something changed? – Mike Sep 10 '18 at 16:48
  • 1
    I think you get zeros because you call np.empty just after `z = np.zeros(r.shape)`. Flip the order and you no longer get True for np.allclose. – ayhan Sep 10 '18 at 16:50
  • user2285236 is correct. Flipping the order screws things up. I will edit the post to reflect this insight. – Mike Sep 10 '18 at 16:53
  • For me, I get more zeroes as I increase the size of the array: (2,3,5) has none, (5, 100, 100) has some non-zero values as the start, and (50, 100, 100) is all zero. Which is fine, because np.empty never promised the results _wouldn't_ be zero-- we're just measuring what junk happened to be in memory beforehand. If we were bored enough I guess we could look at the memory allocation patterns, but honestly, once we see that it's not guaranteed to be zero we know what we need to _do_ and the rest doesn't matter much. – DSM Sep 10 '18 at 16:55

1 Answers1

13

np.empty and np.zeros do different things.

np.empty creates an array from available memory space, leaving whatever values happened to be hanging around in memory as the values. These values may or may not be zeros.

np.zeros creates an array from available memory space, and then fills it with zeros for your chosen dtype. Obviously np.zeros has to do more work so it should be slower, since it's also writing to the memory allocated.

A more fair comparison would be between np.empty and np.ndarray.

wim
  • 338,267
  • 99
  • 616
  • 750