Performance of zeros function in Numpy

Question

I just noticed that the zeros function of numpy has a strange behavior :

%timeit np.zeros((1000, 1000))
1.06 ms ± 29.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.zeros((5000, 5000))
4 µs ± 66 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

On the other hand, ones seems to have a normal behavior. Is anybody know why initializing a small numpy array with the zeros function takes more time than for a large array ?

(Python 3.5, numpy 1.11)

So the second matrix is 25 times larger, but only takes 4 times longer to create? That is surprising. — President James K. Polk, Jun 11 '17 at 19:28
@JamesKPolk read it again, the second, larger array takes 4 microseconds, the first, smaller array takes 1 millisecond! I'm getting similar, though less extreme results. — juanpa.arrivillaga, Jun 11 '17 at 19:30
I think this is probably `calloc` hitting a threshold where it requests zeroed memory from the OS and doesn't need to actually initialize it. — user2357112, Jun 11 '17 at 19:30
When the size S of a 1D array changes from 4,150,000 to 4,200,000, the time to zero it with `np.zeros(S)` changes from 5.5 ms per loop to 9.6 µs per loop. However, the number of loops in `%timeit` simultaneously changes from 100 to 100,000. My guess is that for an array of certain size and above, the difference between the slowest and fastest runs becomes large enough to trigger 1000 times more loops, which drastically improves the measurement accuracy and reduces the reported running time. Not because it is shorter, but because it is measured more accurately. — DYZ, Jun 11 '17 at 19:31
@juanpa.arrivillaga: Oops, you're right. Even more surprising! — President James K. Polk, Jun 11 '17 at 19:32
@DYZ I'm using the `timeit.timeit` function, controlling the number at `1000`, and I'm getting `0.343710215005558` for (1000,1000) and `0.0028691469924524426` for (5000,5000) — juanpa.arrivillaga, Jun 11 '17 at 19:34
The exact threshold for me is 4193790: `np.zeros(4193789)` is slow, `np.zeros(4193790)` is fast. I'd be interested to know exactly why 4193790 * 8 bytes (since we're allocating an array of the float64 type) is special. — Alex Riley, Jun 11 '17 at 19:47

score 18 · Accepted Answer · answered Jun 11 '17 at 19:38

This looks like calloc hitting a threshold where it makes an OS request for zeroed memory and doesn't need to initialize it manually. Looking through the source code, numpy.zeros eventually delegates to calloc to acquire a zeroed memory block, and if you compare to numpy.empty, which doesn't perform initialization:

In [15]: %timeit np.zeros((5000, 5000))
The slowest run took 12.65 times longer than the fastest. This could mean that a
n intermediate result is being cached.
100000 loops, best of 3: 10 µs per loop

In [16]: %timeit np.empty((5000, 5000))
The slowest run took 5.05 times longer than the fastest. This could mean that an
 intermediate result is being cached.
100000 loops, best of 3: 10.3 µs per loop

you can see that np.zeros has no initialization overhead for the 5000x5000 array.

In fact, the OS isn't even "really" allocating that memory until you try to access it. A request for terabytes of array succeeds on a machine without terabytes to spare:

In [23]: x = np.zeros(2**40)  # No MemoryError!

In NumPy 1.21.1, the last command results in `numpy.core._exceptions.MemoryError: Unable to allocate 8.00 TiB for an array with shape (1099511627776,) and data type float64`. — A. Donda, Aug 04 '21 at 18:52
@A.Donda: That's probably OS-dependent. (I'm surprised NumPy has its own MemoryError class, though.) — user2357112, Aug 04 '21 at 19:13

Performance of zeros function in Numpy

1 Answers1

Linked

Related