1

I was going through an example in this computer-vision book and was a bit surprised by the code:

descr = []
descr.append(sift.read_features_from_file(featurefiles[0])[1])
descriptors = descr[0] #stack all features for k-means
for i in arange(1,nbr_images):
  descr.append(sift.read_features_from_file(featurefiles[i])[1])
  descriptors = vstack((descriptors,descr[i]))

To me it looks like this is copying the array over and over again and a more efficient implementation would be:

descr = []
descr.append(sift.read_features_from_file(featurefiles[0])[1])
for i in arange(1,nbr_images):
  descr.append(sift.read_features_from_file(featurefiles[i])[1])
descriptors = vstack((descr))

Or am I missing something here and the two codes are not identical. I ran a small test:

print("ATTENTION")
print(descriptors.shape)
print("ATTENTION")
print(descriptors[1:10])

And it seems the list is different?

enter image description here

ali_m
  • 71,714
  • 23
  • 223
  • 298
mptevsion
  • 937
  • 8
  • 28
  • What do the elements of `descr` list look like? arrays? shape? – hpaulj Dec 01 '15 at 18:11
  • By the way, [`range()`](https://docs.python.org/3/library/functions.html#func-range) (or even better, [`xrange()`](https://docs.python.org/2/library/functions.html#xrange) for Python2.x) is much cheaper than [`np.arange`](http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.arange.html). – ali_m Dec 01 '15 at 20:07

1 Answers1

1

You're absolutely right - repeatedly concatenating numpy arrays inside a loop is extremely inefficient. Concatenation always generates a copy, which becomes more and more costly as your array gets bigger and bigger inside the loop.

Instead, do one of two things:

  1. As you have done, store the intermediate values in a regular Python list and convert this to a numpy array outside the loop. Appending to a list is O(1), whereas concatenating np.ndarrays is O(n+k).

  2. If you know how large the final array will be ahead of time, you can pre-allocate it and then fill in the rows inside your for loop, e.g.:

    descr = np.empty((nbr_images, nbr_features), dtype=my_dtype)
    for i in range(nbr_image):
        descr[i] = sift.read_features_from_file(featurefiles[i])[1]
    

Another variant would be to use np.fromiter to lazily generate the array from an iterable object, for example in this recent question.

Community
  • 1
  • 1
ali_m
  • 71,714
  • 23
  • 223
  • 298