Appending to numpy arrays is very inefficient. This is because the interpreter needs to find and assign memory for the entire array at every single step. Depending on the application, there are much better strategies.
If you know the length in advance, it is best to pre-allocate the array using a function like np.ones
, np.zeros
, or np.empty
.
desired_length = 500
results = np.empty(desired_length)
for i in range(desired_length):
results[i] = i**2
If you don't know the length, it's probably more efficient to keep your results in a regular list and convert it to an array afterwards.
results = []
while condition:
a = do_stuff()
results.append(a)
results = np.array(results)
Here are some timings on my computer.
def pre_allocate():
results = np.empty(5000)
for i in range(5000):
results[i] = i**2
return results
def list_append():
results = []
for i in range(5000):
results.append(i**2)
return np.array(results)
def numpy_append():
results = np.array([])
for i in range(5000):
np.append(results, i**2)
return results
%timeit pre_allocate()
# 100 loops, best of 3: 2.42 ms per loop
%timeit list_append()
# 100 loops, best of 3: 2.5 ms per loop
%timeit numpy_append()
# 10 loops, best of 3: 48.4 ms per loop
So you can see that both pre-allocating and using a list then converting are much faster.