Is there a way to avoid loops to make this code faster?

Question

Is there a way to avoid loops to make this code faster ?

"var" is the desired result.

"AA" and "BB" are vectors whose values are known.

The four principal lines of this code are based on the logic: 00,01,10,11 ( 0 for <= and 1 for > )

for i=3:1:2000

    for j=1:50011
        if AA(i) == j && BB(i)<=BB(i-1) && BB(i-1)<=BB(i-2)
           var(i)=j;
        else
            if AA(i) == j && BB(i)<=BB(i-1) && BB(i-1)>BB(i-2)
               var(i)=j+50011;
            else
                if AA(i) == j && BB(i)>BB(i-1) && BB(i-1)<=BB(i-2)
                   var(i)=j+2*50011;
                else
                    if AA(i) == j && BB(i)>BB(i-1) && BB(i-1)>BB(i-2)
                       var(i)=j+3*50011;
                    end
                end
            end
        end
    end

end

score 3 · Accepted Answer · edited May 23 '17 at 11:59

I managed to vectorize your code into one line of code! Going from a couple of seconds down to under a millisecond:

out = [0 0 AA(3:end) + ([1 2] * (diff(BB(hankel(1:3, 3:numel(BB)))) > 0)).*50011];

To understand how to get there, let's incrementally improve the original code.

1)

First we start with the double-loop you have:

tic
var0 = zeros(size(AA));
for i=3:numel(AA)
    for j=1:N
        if AA(i) == j && BB(i)<=BB(i-1) && BB(i-1)<=BB(i-2)
            var0(i)=j;
        else
            if AA(i) == j && BB(i)<=BB(i-1) && BB(i-1)>BB(i-2)
                var0(i)=j+50011;
            else
                if AA(i) == j && BB(i)>BB(i-1) && BB(i-1)<=BB(i-2)
                    var0(i)=j+2*50011;
                else
                    if AA(i) == j && BB(i)>BB(i-1) && BB(i-1)>BB(i-2)
                        var0(i)=j+3*50011;
                    end
                end
            end
        end
    end
end
toc

2)

As @SpamBot noted, the nested if/else statements can be simplified by chaining them instead. Also you're evaluating the same test AA(i)==j multiple times. If the test is false, the whole for-loop is skipped. So we can eliminate this second for-loop, and directly use j=AA(i).

Here is new the code:

tic
var1 = zeros(size(AA));
for i=3:numel(AA)
    j = AA(i);
    if BB(i)<=BB(i-1) && BB(i-1)<=BB(i-2)
        var1(i) = j;
    elseif BB(i)<=BB(i-1) && BB(i-1)>BB(i-2)
        var1(i) = j + 50011;
    elseif BB(i)>BB(i-1) && BB(i-1)<=BB(i-2)
        var1(i) = j + 2*50011;
    elseif BB(i)>BB(i-1) && BB(i-1)>BB(i-2)
        var1(i) = j + 3*50011;
    end
end
toc

This is a huge improvement, and the code will run in a fraction of the original time. Still we can do better...

3)

As you mentioned in your question, the if/else conditions correspond to the pattern 00, 01, 10, 11 where 0/1 or false/true are the results of performing a binary x>y test over adjacent numbers.

Using this idea, we get the the following code:

tic
var2 = zeros(size(AA));
for i=3:numel(AA)
    val = (BB(i) > BB(i-1)) * 10 + (BB(i-1) > BB(i-2));
    switch (val)
        case 00
            k = 0;
        case 01
            k = 50011;
        case 10
            k = 2*50011;
        case 11
            k = 3*50011;
    end
    var2(i) = AA(i) + k;

end
toc

4)

Let's replace that switch statement with a table-lookup operation. This gives us this new version:

tic
v = [0 1 2 3] * 50011;  % 00 01 10 11
var3 = zeros(size(AA));
for i=3:numel(AA)
    var3(i) = AA(i) + v((BB(i) > BB(i-1))*2 + (BB(i-1) > BB(i-2)) + 1);
end
toc

5)

In this final version, we can get rid of the loop entirely by noting that each iteration accesses the slice BB(i-2:i) in a sliding-window manner. We can neatly use the hankel function to create a sliding window over BB (each returned as a column).

Next we use diff to perform vectorized comparisons, then map the resulting 0/1 of the two tests as [0 1 2 3]*50011 values. Lastly we add the vector AA appropriately.

This gives us our final one-liner, completely vectorized:

tic
var4 = [0, 0, AA(3:end) + ([1 2] * (diff(BB(hankel(1:3, 3:numel(BB)))) > 0)).*50011];
toc

Comparison

To verify the above solutions, I used the following random vectors as testing data:

N = 50011;
AA = randi(N, [1 2000]);
BB = randi(N, [1 2000]);

assert(isequal(var0,var1,var2,var3,var4))

I get the following times matching the solutions in the order presented:

>> myscript  % tested in MATLAB R2014a
Elapsed time is 1.663210 seconds.
Elapsed time is 0.000111 seconds.
Elapsed time is 0.000099 seconds.
Elapsed time is 0.000089 seconds.
Elapsed time is 0.000417 seconds.

>> myscript  % tested in MATLAB R2015b (with the new execution engine)
Elapsed time is 2.816541 seconds.
Elapsed time is 0.000233 seconds.
Elapsed time is 0.000158 seconds.
Elapsed time is 0.000157 seconds.
Elapsed time is 0.000339 seconds.

Hope this post wasn't too long, I just wanted to show how to tackle this type of problems by making incremental changes.

Now take your pick at which solution you like best :)

Thank you very much for your unswer! But, if I understood your comparison, (except the first version), the final vesion requires more time than the three other !!! — bzak, Mar 30 '16 at 14:48
We're talking in the order of under a millisecond here, there's not any significant difference... I suggest you go with the solution you feel most comfortable with (#2 above is perfectly fine). I admit, in #3 to #5 the focus was more on getting a shorter and fully vectorized solution than any dramatic performance improvement. — Amro, Mar 30 '16 at 15:36
You see here in the MATLAB tag, we often take this sort of questions as a challenge to write the shortest possible vectorized code, that doesn't always mean the most readable or fastest :) — Amro, Mar 30 '16 at 15:39

SpamBot · Answer 2 · 2016-03-30T11:22:58.057

I think the most important improvement would be to evaluate AA(i)==j only once when looping over j.

Also, you are potentially overwriting var(i) very often even though only the last overwrite is relevant. Consider taking only one j==AA(i) and do the if-else only for that. Essentially, you are searching for j within AA, use find(AA == j, 1) instead.

As a side note, you have unnecessary if/end blocks which could directly go into the parent else block.

score 1 · Answer 3 · answered Mar 30 '16 at 18:15

Here's one vectorized approach mainly using logical indexing -

%// Get size parameter
N = numel(AA)-2;
limit = 50011;

%// Get differentiation across BB
dB = diff(BB);

%// Construct an ID array with different valus for various conditions
id = ones(N,1);
id((dB(2:end) <= 0) & (dB(1:end-1) > 0)) = 2;
id((dB(2:end) > 0) & (dB(1:end-1) <= 0)) = 3;
id((dB(2:end) > 0) & (dB(1:end-1) > 0)) = 4;

%// Get scaled values as used under various IF statements
vals = ((id-1)*50011) + AA(3:end);

%// Get a valid mask that would be used to set values from vals into output
valid_mask = AA(3:end) <= limit;

%// Setup output array and selectively set values from vals using valid_mask
var_out = zeros(1,N);
var_out(valid_mask) = vals(valid_mask);

Please note that the original output would have the first two elements as zeros always. The output with the proposed solution skips the first two elements to avoid that redundancy. If needed for consistency with the old paradigm, pad with two zeros at the start -

final_out = [0 0 var_out];

Is there a way to avoid loops to make this code faster?

3 Answers3

1)

2)

3)

4)

5)

Comparison