I have recently observed a problem in allocation and copying of Numpy arrays:
Array allocation takes constant time (wrt. array size); copying the contents of another array into the allocated array, takes also some amount of time which increases with array size. However, the problem is that the time elapsed for doing both operations, allocation and copying, is not merely the sum of the time for either of the operations (see the figure below):
t(allocation + copy) > t(allocation) + t(copy)
.
I cannot see the reason for the extra time elapsed (which rapidly increases with size).
Here's the code I used for timing. The timing is performed under Debian Stretch with an Intel Core i3 CPU (2.13 GHz).
import numpy as np
import gc
from timeit import default_timer as timer
import matplotlib.pyplot as plt
def time_all(dim1):
N_TIMES = 10
shape = (dim1, dim1)
data_1 = np.empty(shape, np.int16)
data_2 = np.random.randint(0, 2**14, shape, np.int16)
# allocate array
t1 = timer()
for _ in range(N_TIMES):
data_1 = np.empty(shape, np.int16)
alloc_time = (timer() - t1) / N_TIMES
# copy array
t1 = timer()
for _ in range(N_TIMES):
data_1[:] = data_2
copy_time = (timer() - t1) / N_TIMES
# allocate & copy array
t1 = timer()
for _ in range(N_TIMES):
data_3 = np.empty(shape, np.int16)
np.copyto(data_3, data_2)
alloc_copy_time = (timer() - t1) / N_TIMES
return alloc_time, copy_time, alloc_copy_time
#END def
# measure elapsed times
gc.disable() # disable automatic garbage collection
times_elapsed = np.array([(size, ) + time_all(size)
for size in np.logspace(2, 14, 1<<8,
endpoint=True, base=2, dtype=int)])
gc.enable()
# plot results
plt.plot(times_elapsed[:,0], times_elapsed[:,1], marker='+', lw=0.5, label="alloc")
plt.plot(times_elapsed[:,0], times_elapsed[:,2], marker='+', lw=0.5, label="copy")
plt.plot(times_elapsed[:,0], times_elapsed[:,3], marker='+', lw=0.5, label="alloc©")
plt.xlabel("array dim.")
plt.legend()
plt.savefig("alloc_copy_time.svg")