33

If you were to choose one of the following three ways of initializing an array with zeros which one would you choose and why?

my_arr_1 = np.full(size, 0) 

or

my_arr_2 = np.zeros(size)

or

my_arr_3 = np.empty(size)
my_arr_3[:] = 0
Dataman
  • 3,457
  • 3
  • 19
  • 31

5 Answers5

21

I'd use np.zeros, because of its name. I would never use the third idiom because

  1. it takes two statements instead of a single expression and

  2. it's harder for the NumPy folks to optimize. In fact, in NumPy 1.10, np.zeros is still the fastest option, despite all the optimizations to indexing:

>>> %timeit np.zeros(1e6)
1000 loops, best of 3: 804 µs per loop
>>> %timeit np.full(1e6, 0)
1000 loops, best of 3: 816 µs per loop
>>> %timeit a = np.empty(1e6); a[:] = 0
1000 loops, best of 3: 919 µs per loop

Bigger array for comparison with @John Zwinck's results:

>>> %timeit np.zeros(1e8)
100000 loops, best of 3: 9.66 µs per loop
>>> %timeit np.full(1e8, 0)
1 loops, best of 3: 614 ms per loop
>>> %timeit a = np.empty(1e8); a[:] = 0
1 loops, best of 3: 229 ms per loop
Dany
  • 4,521
  • 1
  • 15
  • 32
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • 1
    "full" and assignment are identical, though of course I like to mention that I prefer `a[...] = 0` instead of `a[:]`. zeros nowadays tells the kernel to zero the memory. – seberg Oct 06 '14 at 10:02
  • Nope, but it isn't in my measurements :) – seberg Oct 07 '14 at 15:05
  • @seberg: Based on that, and the times for the 1e6 and 1e8 cases, I'd guess that `np.zeros()` ends up using an anonymous `mmap()` beyond some threshold (`np.empty()` probably uses it too, with the `MAP_UNINITIALIZED` flag set), and the memory doesn't get zeroed - or even properly allocated - before it's first read/written, making the timings more or less useless. (Doing eg. `np.sum()` for the arrays in all the measurements would probably give more reasonable results) – Aleksi Torhamo Nov 21 '18 at 22:20
9

Definitely np.zeros. Not only is it the most idiomatic and common way to do this, it is also by far the fastest:

In [1]: size=100000000

In [3]: %timeit np.full(size, 0)
1 loops, best of 3: 344 ms per loop

In [4]: %timeit np.zeros(size)
100000 loops, best of 3: 8.75 µs per loop

In [5]: %timeit a = np.empty(size); a[:] = 0
1 loops, best of 3: 322 ms per loop
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
7

np.zeros is much faster if one wants to initialize an array to zeros. In the case that one just wants to initialize an array of given shape and type but doesn't care the initial entries in the array, np.empty is slightly faster.

See the following basic test results:

>>%timeit np.zeros(1000000)
7.89 µs ± 282 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>%timeit np.empty(1000000)
7.84 µs ± 332 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Georgy
  • 12,464
  • 7
  • 65
  • 73
l001d
  • 723
  • 9
  • 15
  • You are comparing apples with oranges. `np.zeros(n)` doesn't give the same result as `np.empty(n)`. And this is already explained in @FredFoo's solution. – jpp Sep 06 '18 at 11:00
  • OK thanks for correcting me. So if just initializing, np.empty is slightly faster; if initializing to zero, np.zero is much faster. – l001d Sep 06 '18 at 15:56
  • Sure. But there are 2 distinct issues: (1) different output, (2) speed. In practice, only (1) matters. – jpp Sep 06 '18 at 15:57
  • @jpp I think that this comparison is on the case of if we do not care of the values which will be in the initial array, is there an advantage of using `np.empty` vs `np.zeros`. – Jean Paul Sep 17 '22 at 08:52
2
np.zero():always 0
np.empty():Random number, depending on memory condition

you can see the following to campare

np.zeros( (3,4) )
array([[ 0.,  0.,  0.,  0.],
...    [ 0.,  0.,  0.,  0.],
...    [ 0.,  0.,  0.,  0.]])


np.empty((3,4))
array([[1.13224202e+277, 1.73151846e-077, 1.24374310e-047,1.30455491e-076],
       [3.92384790e+179, 6.01353875e-154, 3.12452337e-033,7.72229932e+140],
       [1.28654694e-320, 0.00000000e+000, 0.00000000e+000,0.00000000e+000]])
Anastasios Selmani
  • 3,579
  • 3
  • 32
  • 48
chenxuZhu
  • 451
  • 4
  • 5
0

First, We should understand the difference between these three which help us to choose one of them.

  1. np.zeros(size): Produce an array of all 0s with the given shape.
np.zeros(5)    #array([0., 0., 0., 0., 0.])
  1. np.empty(5): The empty creates an array whose initial content is random and depends on the state of the memory.
np.empty(4)    #array([0.00000000e+000, 1.05915457e-311, 1.05915457e-311, 1.05915457e-311])
  1. np.full(size, fill_value): Return a new array of given shape and type, filled with fill_value.
np.full((2, 2), 10)      #array([[10, 10],
                                 [10, 10]])  

So, In this case np.zeros(size) is obviously right choose and also the fast way to create an array filled with zeros.

Haroon Hayat
  • 329
  • 2
  • 4