0

Consider:

i = 2
feature_vector_set = []
while i < 2405 - 2:
    j = 2
    while j < 1200 - 2:
       block = diff_image[i-2:i+3, j-2:j+3]
       feature = block.flatten()
       feature_vector_set.append(feature)
       j = j+1
    i = i+1

diff_image is of type int16 with shape(2405, 1200). The whole loop takes 40 minutes to run and is mainly caused by the following line:

feature_vector_set.append(feature)

Is there an alternative way to achieve the same result?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
jayhuashi
  • 19
  • 3
  • Use numpy. You can allocate the array all at once and fill it in, instead of growing it incrementally. – Barmar Nov 07 '19 at 03:25
  • 2
    This issue is **not** appending to the list, which is a very efficient operation in Python (amortized constant time). If I just keep appending to an list without all your array manipulations, it takes two seconds, see the example on repl.it: https://repl.it/repls/HarmoniousSpitefulBooleanlogic – juanpa.arrivillaga Nov 07 '19 at 04:49
  • Is there any possibility because you .append(None), which does not require allocate the memory so it is way faster than mine? @juanpa.arrivillaga – jayhuashi Nov 07 '19 at 15:51
  • @Barmar I'm kind of new to the python. Where should I allocate the array, because the result of feature_vector_set is a list, which makes me confused where to allocate the array in it. Should I allocate the array for feature before the loop? – jayhuashi Nov 07 '19 at 15:53
  • @jayhuashi it doesn't need to allocate any memory for the `None` object (which is the whole point of using it) but it does have to allocate memory for the underlying buffer as the list grows. This conclusively shows that `.append` is not your bottleneck. – juanpa.arrivillaga Nov 07 '19 at 17:18
  • 1
    @jayhuashi How exactly did you _determine_ it's `.append` that is slow? – digitalarbeiter Nov 13 '19 at 15:41
  • @digitalarbeiter because everytime I run the code, it stuck in this line – jayhuashi Nov 21 '19 at 04:05
  • 1
    @jayhuashi What do you mean, stuck? Like in, when you interrupted your program, the stacktrace showed this line? That might be because `flatten()` is probably a C extension, and those often don't react to Ctrl-C/Ctrl-Break (see https://stackoverflow.com/a/33652496/189018), so the KeyboardInterrupt is raised for the next Python line, which is the `.append()`. To really *measure* which part is slow, you need to *profile* your code: https://docs.python.org/3/library/profile.html – digitalarbeiter Nov 25 '19 at 10:09

1 Answers1

-1

Try using a deque. It's part of the Python collections, and it uses a doubly-linked list internally.

from collections import deque

i = 2
feature_vector_set = deque()
while i < 2405 - 2:
   j = 2
   while j < 1200 - 2:
       block = diff_image[i-2:i+3, j-2:j+3]
       feature = block.flatten()
       feature_vector_set.append(feature)
       j = j+1
   i = I+1
feature_vector_list = list(feature_vector_set)

You can find the time complexities of common operations on Python data types here.

Deque documentation

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • "If you are going to append a lot of elements, a list is not the appropriate data structure for it." Incorrect. Appending to the end of a python list is very efficient, amortized constant time. It is insertion at the beginning that is inefficient (linear). – juanpa.arrivillaga Nov 07 '19 at 04:43
  • This is my understanding. Please correct me if I'm wrong. When we run out of the allocated memory to a list, it reallocates memory, essentially copying everything to another place in memory. When we're appending a lot of elements, this copying takes place multiple times. Isn't that inefficient? – Neeraz Lakkapragada Nov 07 '19 at 05:00
  • It is amortized constant time, because the underlying buffer is over allocated every time it resizes. See the own link you provided, `.append` for list is constant time O(1) – juanpa.arrivillaga Nov 07 '19 at 05:24
  • Indeed, the list is slightly faster, (although they have the same algorithmic complexity) see the following benchmark: https://repl.it/repls/CriminalUnfoldedCylinder – juanpa.arrivillaga Nov 07 '19 at 05:41
  • Thank you man...will modify my answer. I do know that the list.append() takes O(1), but I was thinking since the resizes take time, deque is slightly better – Neeraz Lakkapragada Nov 07 '19 at 05:58
  • It's explained well here: https://stackoverflow.com/questions/33044883/why-is-the-time-complexity-of-pythons-list-append-method-o1 – Barmar Nov 07 '19 at 18:23