I have a large numpy array that I need to reset its values to 0 regularly. I tried these:
x = np.zeros((10**4, 10**6), dtype=np.float32)
%timeit x[:7000, :] = 0.0 # 4 seconds
%timeit x[:7000, :].fill(0.0) # 4 seconds
On the other hand, creating a new array is much faster:
%timeit x = np.zeros((10**4, 10**6), dtype=np.float32) # 8 microseconds
However, the new array has different memory address that significantly decreases the performance of subsequent copying.
Is there a way to reset the array values to 0 as fast as creating a new array?
Otherwise, is there a way to creating a new zeros array that keeps the same memory address?
Update: Some concrete tests:
x = np.zeros((10**3, 10**6), dtype=np.float32)
%timeit x[100:300, 100:200000].sum()
# 12 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit x = np.zeros((10**3, 10**6), dtype=np.float32); x[100:300, 100:200000].sum()
# 42.4 ms ± 912 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit x[200:1000, 200:1000000] = 0.0; x[100:300, 100:200000].sum()
# 413 ms ± 55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x = np.ones((10**3, 10**6), dtype=np.float32); x[100:300, 100:200000].sum()
# 1.91 s ± 286 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Accessing the array after reseting values, np.zeros
takes only 42 ms compared to 413 ms of broadcast assignment. I would appreciate if someone have a solution for faster reset to 0. But I'm also open to be convinced that np.zeros
is actually not faster with clear evidences.
Update: I ended up migrating some heavy parts to torch to utilize fast gpu memory and did other code optimizations, gaining 2x speedup in the process.