Python v-stacking in a loop

Question

I was going through an example in this computer-vision book and was a bit surprised by the code:

descr = []
descr.append(sift.read_features_from_file(featurefiles[0])[1])
descriptors = descr[0] #stack all features for k-means
for i in arange(1,nbr_images):
  descr.append(sift.read_features_from_file(featurefiles[i])[1])
  descriptors = vstack((descriptors,descr[i]))

To me it looks like this is copying the array over and over again and a more efficient implementation would be:

descr = []
descr.append(sift.read_features_from_file(featurefiles[0])[1])
for i in arange(1,nbr_images):
  descr.append(sift.read_features_from_file(featurefiles[i])[1])
descriptors = vstack((descr))

Or am I missing something here and the two codes are not identical. I ran a small test:

print("ATTENTION")
print(descriptors.shape)
print("ATTENTION")
print(descriptors[1:10])

And it seems the list is different?

What do the elements of `descr` list look like? arrays? shape? — hpaulj, Dec 01 '15 at 18:11
By the way, [`range()`](https://docs.python.org/3/library/functions.html#func-range) (or even better, [`xrange()`](https://docs.python.org/2/library/functions.html#xrange) for Python2.x) is much cheaper than [`np.arange`](http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.arange.html). — ali_m, Dec 01 '15 at 20:07

score 1 · Accepted Answer · edited May 23 '17 at 11:44

You're absolutely right - repeatedly concatenating numpy arrays inside a loop is extremely inefficient. Concatenation always generates a copy, which becomes more and more costly as your array gets bigger and bigger inside the loop.

Instead, do one of two things:

As you have done, store the intermediate values in a regular Python list and convert this to a numpy array outside the loop. Appending to a list is O(1), whereas concatenating np.ndarrays is O(n+k).

If you know how large the final array will be ahead of time, you can pre-allocate it and then fill in the rows inside your for loop, e.g.:

descr = np.empty((nbr_images, nbr_features), dtype=my_dtype)
for i in range(nbr_image):
    descr[i] = sift.read_features_from_file(featurefiles[i])[1]

Another variant would be to use np.fromiter to lazily generate the array from an iterable object, for example in this recent question.

Python v-stacking in a loop

1 Answers1