Optimize an algorithm which is executed a lot of times

Question

I have a couple of for loops and the innermost one will be executed a lot of times. This innermost loop contains some heavy calculations using numpy, so all of this is taking a lot of time. So I am trying to optimize the innermost loop.

The most inner-loop contains the following logic:

I have two numpy-arrays (much larger in real life):

left = np.asarray([0.4, 0.2, 0.2, 0.7, 0.6, 0.2, 0.3])
right= np.asarray([0.2, 0.7, 0.3, 0.2, 0.1, 0.9, 0.7])

These are compared to a threshold to see if I should go left or right. If left[x] > 0.55 and right[x] < 0.45 I want to go left. If left[x] < 0.55 and right[x] > 0.45 I want to go right. I have solved this by creating two boolean arrays, one for left and one for right, according to:

leftListBool = ((left > 0.55)*1 + (right < 0.45)*1 - 1) > 0
rightListBool = ((right > 0.55)*1 + (left < 0.45)*1 - 1) > 0

Which for the above example gives me:

leftListBool = [False False False  True  True False False]
rightListBool = [False  True False False False  True  True]

But I am not allowed to go left if I went left the last time (and same for right). Therefore I loop these list according to:

wentLeft = False
wentRight = False
a = 0
for idx, v in enumerate(leftListBool):
    if leftListBool[idx] and not wentRight:
        a += DoAThing(idx)
        wentLeft = False
        wentRight = True
    elif rightListBool[idx] and not wentLeft:
        a += DoAnotherThing(idx)
        wentLeft = True
        wentRight = False

Where DoAThing() and DoAnotherThing() simply fetches a value from a numpy-array.

This is as far as I have come in terms of optimization (it was much worse before). Note that I need to do DoAThing() and DoAnotherThing() in the correct orders, since they depend on the previous value.

What have I tried?

My first idea was to create a unified list of leftListbool and rightListBool, which would look like (left = 1 and right = -1):

unified = [0 1 0 -1 -1 1 1]

But I am stuck to do this in a more optimal way than:

buyListBool.astype(int)-sellListBool.astype(int)

But even if I achieve this, I need to only include the first value, if for example I have two 1 following each other, which would result in:

unified = [0 1 0 -1 0 1 0]

In which case I could reduce the for-loop to:

for i in unified:
    if i == 1:
        a += DoAThing(a)
    elif i == -1:
        a += DoAnotherThing(a)

But even this for-loop can mayby be optimized using some numpy-magic which I haven't figured out yet.

Complete runable code:

start = time.time()

topLimit = 0.55
bottomLimit = 0.45

for outI in range(200):
    for midI in range(200):
        topLimit = 0.55
        bottomLimit = 0.45
        res = np.random.rand(200,3)
        left = res[:,0]        
        right = res[:,1]
        valList = res[:,2]

        #These two statements can probably be optimized 
        leftListBool = ((left > topLimit)*1 + (right < bottomLimit)*1 - 1) > 0
        rightListBool = ((right > topLimit)*1 + (left < bottomLimit)*1 - 1) > 0

        wentLeft = False
        wentRight = False
        a=0
        #Hopefully this loop can be optimized
        for idx, v in enumerate(leftListBool):
            if leftListBool[idx] and not wentRight:
                a += valList[idx]
                wentLeft = False
                wentRight = True
            elif rightListBool[idx] and not wentLeft:
                a += valList[idx]
                wentLeft = True
                wentRight = False

end = time.time()
print(end - start)

I find this question as a good one for [codereview](https://codereview.stackexchange.com/) — , Jun 06 '17 at 12:05
`leftListBool = ((left > 0.55)*1 + (right < 0.45)*1 - 1) > 0` this could be `leftListBool = (left > 0.55 AND right < 0.45)` for example — Ma0, Jun 06 '17 at 12:07
If the processing is as simple as that, I don't think that any micro-optimization can have a significant impact. And as you left DoAThing and DoAnotherThing as black boxes, we cannot judge if these represent significant processing. — , Jun 06 '17 at 12:14
what exactly is `val`? I assumed that's a typo and should be `leftListBool` like in a previous example. But it would be good if you corrected it (or provided how it's generated). — MSeifert, Jun 06 '17 at 12:29
@Ev.Kounis can i realy do `left > 0.55 AND right < 0.45` if these are numpy arrays? — Cleared, Jun 06 '17 at 13:07
@MSeifert Correct, it was a typo and should be `leftListBool`, i have updated the euqstion — Cleared, Jun 06 '17 at 13:08

MSeifert · Answer 1 · 2017-06-06T12:27:16.940

If you need to loop over your sequence and you care about performance you shouldn't use numpy.arrays. NumPy arrays are great when NumPy can do the loop, but if you have to loop over it yourself it will be slow (I covered the details why iterations over arrays are quite slow in another answer recently, if you want to take a look: "convert np array to a set takes too long").

You could simply use tolist and zip to avoid the iterating numpy-array overhead:

import time
import numpy as np

start = time.time()

topLimit = 0.55
bottomLimit = 0.45

for outI in range(200):
    for midI in range(200):
        topLimit = 0.55
        bottomLimit = 0.45
        res = np.random.rand(200,2)
        left = res[:,0].tolist()      # tolist!
        right = res[:,1].tolist()     # tolist!

        wentLeft = False
        wentRight = False
        a=0

        for leftitem, rightitem in zip(left, right):
            if leftitem > topLimit and rightitem < bottomLimit and not wentRight:
                wentLeft, wentRight = False, True
            elif rightitem > topLimit and leftitem < bottomLimit and not wentLeft:
                wentLeft, wentRight = True, False

end = time.time()
print(end - start)

This reduced the runtime by 30% on my computer.

You could also do the tolist conversion lateron (which may or may not be faster):

start = time.time()

topLimit = 0.55
bottomLimit = 0.45

for outI in range(200):
    for midI in range(200):
        topLimit = 0.55
        bottomLimit = 0.45
        res = np.random.rand(200,2)
        left = res[:,0]     
        right = res[:,1]

        # use tolist after the comparisons
        leftListBool = ((left > topLimit) & (right < bottomLimit)).tolist()
        rightListBool = ((right > topLimit) & (left < bottomLimit)).tolist()

        wentLeft = False
        wentRight = False
        a=0
        #Hopefully this loop can be optimized
        for idx in range(len(leftListBool)):  # avoid direct iteration over an array
            if leftListBool[idx] and not wentRight:
                #a += DoAThing(a)
                wentLeft = False
                wentRight = True
            elif rightListBool[idx] and not wentLeft:
                #a += DoAnotherThing(a)
                wentLeft = True
                wentRight = False

end = time.time()
print(end - start)

That was roughly as fast as the other approach but when left and right get much bigger than 200 elements it could become much faster.

However that was just based on the algorithm without knowledge of DoAThing and DoAnotherThing. It's possible that you could construct them in a way that allowed vectorized operations (which could speed it up by an order of magnitude without using lists). That's much tougher though and I don't know what these functions are doing.

Perfect, thanks for the information about iterating over numpy-arrays, and thanks for the link. This was exactly the type of information I was looking. I have updated my question with information about `DoAThing` and `DoAnotherThing`. They basically fetch a value from an array, which will fit in the zip according to your solution. I will give it a try then accept your answer if nothing else shows up. — Cleared, Jun 06 '17 at 13:11
NB: I think it would be more appropriate to ask a new question including your `valList`. Changing non-trivial things in a question after it received answers isn't particular helpful because it invalidates the existing answers (trivial edits like typos are okay). — MSeifert, Jun 06 '17 at 14:50
Thanks for the comment, i will consider that (and in my future questions). But since i felt that your answer was still valid i thought that it was ok. I just wantet to point out that i could not vectorize `DoAThing` and `DoAnotherThing` (which, after reading my question, i realised i failed doing). — Cleared, Jun 06 '17 at 15:42

score 1 · Accepted Answer · answered Jun 06 '17 at 16:16

Based on the updated questions I'll present a way to vectorize the code:

import time

start = time.time()

topLimit = 0.55
bottomLimit = 0.45

for outI in range(200):
    for midI in range(200):
        topLimit = 0.55
        bottomLimit = 0.45
        res = np.random.rand(200,3)
        left = res[:,0]        
        right = res[:,1]
        valList = res[:,2]

        # Arrays containing where to go left and when to go right
        leftListBool = ((left > topLimit) & (right < bottomLimit))
        rightListBool = ((right > topLimit) & (left < bottomLimit))

        # Exclude all points that are neither right or left
        common = leftListBool | rightListBool
        valList = valList[common]
        leftListBool = leftListBool[common]
        rightListBool = rightListBool[common]

        # Remove the values where you would go right or left multiple times in a row
        leftListBool[1:] &= leftListBool[1:] ^ leftListBool[:-1]
        rightListBool[1:] &= rightListBool[1:] ^ rightListBool[:-1]
        valList = valList[leftListBool | rightListBool]

        # Just use np.sum to calculate the sum of the remaining items
        a = np.sum(valList)

end = time.time()
print(end - start)

The inner loop is completly vectorized and the approach is (on my computer) 3 times faster than your original code. Let me know if I need to add more explanation about some parts. The ^ (xor operator) is just a more performant way of np.diff that works only for boolean arrays.

Thanks for the effort and answer, i really appreciate it. I tried to make the runnable code as easy and stripped down as i could, but i realise i made it to stripped down. The `valList[idx]` was just supposed to represent an arbitrary operation that has a very little impact on the overall performance that cannot be vectorized. I have updated to make it represent my case better. In each iteration, i fetch a value from a list, and the position depends on my last fetched value. I will post another question about how i can vectorize my `DoAThing(a)` and give more details there — Cleared, Jun 06 '17 at 17:01
@Cleared Ok, please ask a new question including all relevant details and rollback this question. It doesn't make sense to do incremental answer -> edit -> answer -> edit. You're constantly invalidating answers with these changes. — MSeifert, Jun 06 '17 at 17:12
I have rolled back the question to the state that the answers apply to (so changes only includes bug-fix for code). I will ask a new question regarding the vectorisation. Thanks for the feedback — Cleared, Jun 06 '17 at 17:17
I have asked the new question https://stackoverflow.com/questions/44396618/vectorize-or-optimize-an-loop-where-each-iteration-depends-on-the-state-of-the-p now — Cleared, Jun 06 '17 at 17:50

Optimize an algorithm which is executed a lot of times

The most inner-loop contains the following logic:

What have I tried?

Complete runable code:

2 Answers2