1

I have recently observed a problem in allocation and copying of Numpy arrays:

Array allocation takes constant time (wrt. array size); copying the contents of another array into the allocated array, takes also some amount of time which increases with array size. However, the problem is that the time elapsed for doing both operations, allocation and copying, is not merely the sum of the time for either of the operations (see the figure below):

t(allocation + copy) > t(allocation) + t(copy).

I cannot see the reason for the extra time elapsed (which rapidly increases with size).

Numpy allocation + copy

Here's the code I used for timing. The timing is performed under Debian Stretch with an Intel Core i3 CPU (2.13 GHz).

import numpy as np
import gc
from timeit import default_timer as timer
import matplotlib.pyplot as plt

def time_all(dim1):
   N_TIMES = 10
   shape = (dim1, dim1)
   data_1 = np.empty(shape, np.int16)
   data_2 = np.random.randint(0, 2**14, shape, np.int16)

   # allocate array
   t1 = timer()
   for _ in range(N_TIMES):
      data_1 = np.empty(shape, np.int16)
   alloc_time = (timer() - t1) / N_TIMES

   # copy array
   t1 = timer()
   for _ in range(N_TIMES):
      data_1[:] = data_2
   copy_time = (timer() - t1) / N_TIMES

   # allocate & copy array 
   t1 = timer()
   for _ in range(N_TIMES):
      data_3 = np.empty(shape, np.int16)
      np.copyto(data_3, data_2)
   alloc_copy_time = (timer() - t1) / N_TIMES

   return alloc_time, copy_time, alloc_copy_time
#END def

# measure elapsed times
gc.disable() # disable automatic garbage collection
times_elapsed = np.array([(size, ) + time_all(size)
                for size in np.logspace(2, 14, 1<<8,
                endpoint=True, base=2, dtype=int)])
gc.enable()

# plot results
plt.plot(times_elapsed[:,0], times_elapsed[:,1], marker='+', lw=0.5, label="alloc")
plt.plot(times_elapsed[:,0], times_elapsed[:,2], marker='+', lw=0.5, label="copy")
plt.plot(times_elapsed[:,0], times_elapsed[:,3], marker='+', lw=0.5, label="alloc&copy")
plt.xlabel("array dim.")
plt.legend()
plt.savefig("alloc_copy_time.svg")
feedMe
  • 3,431
  • 2
  • 36
  • 61
AlQuemist
  • 1,110
  • 3
  • 12
  • 22
  • `np.empty` and `np.zeros` are different in how they allocate space; it's been explored in other SO questions. – hpaulj Dec 15 '18 at 21:32
  • `np.empty` is more of a potential allocations, it does not actually fill out any values. Your copy test writes values to an existing array; the first time it will fill the `empty` array, the rest of the repeats are just copy to an existing array. The third test does a full create and copy. It would be interesting to see what using `np.zeros` instead of `np.empty` does. – hpaulj Dec 16 '18 at 03:16
  • I explored these operations in https://stackoverflow.com/questions/27464039/why-the-performance-difference-between-numpy-zeros-and-numpy-zeros-like. Search on 'np.empty vs np.zeros' will produce other SO discussions. – hpaulj Dec 16 '18 at 03:29

0 Answers0