My timings in Ipython are (with a simplier timeit interface):
In [57]: timeit np.zeros_like(x)
1 loops, best of 3: 420 ms per loop
In [58]: timeit np.zeros((12488, 7588, 3), np.uint8)
100000 loops, best of 3: 15.1 µs per loop
When I look at the code with IPython (np.zeros_like??
) I see:
res = empty_like(a, dtype=dtype, order=order, subok=subok)
multiarray.copyto(res, 0, casting='unsafe')
while np.zeros
is a blackbox - pure compiled code.
Timings for empty
are:
In [63]: timeit np.empty_like(x)
100000 loops, best of 3: 13.6 µs per loop
In [64]: timeit np.empty((12488, 7588, 3), np.uint8)
100000 loops, best of 3: 14.9 µs per loop
So the extra time in zeros_like
is in that copy
.
In my tests, the difference in assignment times (x[]=1
) is negligible.
My guess is that zeros
, ones
, empty
are all early compiled creations. empty_like
was added as a convenience, just drawing shape and type info from its input. zeros_like
was written with more of an eye toward easy programming maintenance (reusing empty_like
) than for speed.
np.ones
and np.full
also use the np.empty ... copyto
sequence, and show similar timings.
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/array_assign_scalar.c
appears to be file that copies a scalar (such as 0
) to an array. I don't see a use of memset
.
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/alloc.c has calls to malloc
and calloc
.
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c - source for zeros
and empty
. Both call PyArray_NewFromDescr_int
, but one ends up using npy_alloc_cache_zero
and the other npy_alloc_cache
.
npy_alloc_cache
in alloc.c
calls alloc
. npy_alloc_cache_zero
calls npy_alloc_cache
followed by a memset
. Code in alloc.c
is further confused with a THREAD option.
More on the calloc
v malloc+memset
difference at:
Why malloc+memset is slower than calloc?
But with caching and garbage collection, I wonder whether the calloc/memset
distinction applies.
This simple test with the memory_profile
package supports the claim that zeros
and empty
allocate memory 'on-the-fly', while zeros_like
allocates everything up front:
N = (1000, 1000)
M = (slice(None, 500, None), slice(500, None, None))
Line # Mem usage Increment Line Contents
================================================
2 17.699 MiB 0.000 MiB @profile
3 def test1(N, M):
4 17.699 MiB 0.000 MiB print(N, M)
5 17.699 MiB 0.000 MiB x = np.zeros(N) # no memory jump
6 17.699 MiB 0.000 MiB y = np.empty(N)
7 25.230 MiB 7.531 MiB z = np.zeros_like(x) # initial jump
8 29.098 MiB 3.867 MiB x[M] = 1 # jump on usage
9 32.965 MiB 3.867 MiB y[M] = 1
10 32.965 MiB 0.000 MiB z[M] = 1
11 32.965 MiB 0.000 MiB return x,y,z