1

I have a large numpy array that I need to reset its values to 0 regularly. I tried these:

x = np.zeros((10**4, 10**6), dtype=np.float32)
%timeit x[:7000, :] = 0.0  # 4 seconds
%timeit x[:7000, :].fill(0.0)  # 4 seconds

On the other hand, creating a new array is much faster:

%timeit x = np.zeros((10**4, 10**6), dtype=np.float32)  # 8 microseconds

However, the new array has different memory address that significantly decreases the performance of subsequent copying.

Is there a way to reset the array values to 0 as fast as creating a new array?

Otherwise, is there a way to creating a new zeros array that keeps the same memory address?

Update: Some concrete tests:

x = np.zeros((10**3, 10**6), dtype=np.float32)

%timeit x[100:300, 100:200000].sum()                                                                                                  
# 12 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit x = np.zeros((10**3, 10**6), dtype=np.float32); x[100:300, 100:200000].sum()                                                  
# 42.4 ms ± 912 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit x[200:1000, 200:1000000] = 0.0; x[100:300, 100:200000].sum()                                                                  
# 413 ms ± 55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit x = np.ones((10**3, 10**6), dtype=np.float32); x[100:300, 100:200000].sum()                                                   
# 1.91 s ± 286 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Accessing the array after reseting values, np.zeros takes only 42 ms compared to 413 ms of broadcast assignment. I would appreciate if someone have a solution for faster reset to 0. But I'm also open to be convinced that np.zeros is actually not faster with clear evidences.

Update: I ended up migrating some heavy parts to torch to utilize fast gpu memory and did other code optimizations, gaining 2x speedup in the process.

THN
  • 3,351
  • 3
  • 26
  • 40
  • Did you try `x *= 0`? – ddejohn Sep 28 '21 at 18:58
  • @ddejohn That is 3.37 seconds – THN Sep 28 '21 at 19:00
  • 1
    `x[:]=0` as seen in this [post](https://stackoverflow.com/questions/17482955/how-to-zero-out-rows-columns-in-an-array) might be faster – rachelyw Sep 28 '21 at 19:01
  • @rachelyw That is 5.78 seconds – THN Sep 28 '21 at 19:04
  • Actually I was surprise that reseting values is a million times slower than creating a new array. There should be some tricks to reset faster, or to create a new zeros array at the same memory address. – THN Sep 28 '21 at 19:06
  • Does this answer your question? [Performance difference between filling existing numpy array and creating a new one](https://stackoverflow.com/questions/31498784/performance-difference-between-filling-existing-numpy-array-and-creating-a-new-o) – ddejohn Sep 28 '21 at 19:12
  • PS - you can use `x = np.zeros(x.shape, x.dtype)` – ddejohn Sep 28 '21 at 19:12
  • @ddejohn Thanks for the link! The answer over there further confirms my observation, that for large array, creating new zeros array is much faster than reseting its values. However, there is still no solution for fast reset the array and keeping the memory address. – THN Sep 28 '21 at 19:18
  • 1
    The byte representation of 0.0 consists of zero-bytes only. So my rough idea would be to create the necessary memory with ctypes, hand it over to numpy and use "ctypes.memset" to reset the memory. – Michael Butscher Sep 28 '21 at 19:33
  • 5
    Does this answer your question? [Why is Numpy much faster at creating a Zero array compared to replacing the values of an existing array with zeros?](https://stackoverflow.com/questions/67270937/why-is-numpy-much-faster-at-creating-a-zero-array-compared-to-replacing-the-valu). Put it shortly, the `np.zeros` appear to be faster, but writing in the array will be much slower then. It is not possible to fill about 37 GiB in 8 microseconds. – Jérôme Richard Sep 28 '21 at 19:41
  • Sounds like `reset to 0` is no different from `set to 0` or `set to 23`. A new value has to be copied to each element of the array's data-buffer. At 30+GB that's going to take time, regardless of the (compiled) iteration method and wrappers. – hpaulj Sep 28 '21 at 20:06
  • @JérômeRichard I'm pretty sure that `np.zeros` is actually faster than broadcast assignment, because I could access/read/write value to `x` as the same speed in both cases. Maybe there is a low-level routine that can reset the memory to 0 specifically, not arbitrary values such as 1. – THN Sep 29 '21 at 03:35
  • @MichaelButscher That is a neat idea. I guess `np.zeros` does something along that line. But it's not clear how we can do that explicitly to reset (a slice of) an array. – THN Sep 29 '21 at 04:11
  • 1
    Slices of multidimensional arrays are complicated but to reset the complete array `x` use `ctypes.memset(x.ctypes.data, 0, x.nbytes)` (this is easier than I originally thought). – Michael Butscher Sep 29 '21 at 05:50
  • @THN The point is you cannot fill values faster than what your memory allow. `np.array` appear to be much faster because it use *virtual memory*. The costs are juste *lasily delayed*. It is faster to work on the array only if you do not fully fill it thanks to OS pagination but in the end, the array will certainly be totally filled and the cost paid. Note that a broadcast may be a bit slower overall due to missing non temporal write in some Numpy implementation (there is nothing you can do about it except asking Numpy developper to use it). – Jérôme Richard Sep 29 '21 at 08:43

0 Answers0