0

For a nested for loop, can the case of unknown number of loops be as fast as the case of known number of loops?

Here are some old answers: variable nested for loops

Is BugMeNot2013's answer as fast as possible?

Here are my attempts. https://godbolt.org/g/W4KSlC

code:

#include <iostream>
inline void loop(int depth, int max_depth,
         int* s, int* st, int* c, int* A){
    //s = shape
    //st = stride
    //c = counter
    if (depth != max_depth){
        for(c[depth] = 0; c[depth] < s[depth]; ++c[depth]){
            loop(depth+1, max_depth, s, st, c, A);
        }
    } else {
        A[st[0]*c[0] + st[1]*c[1] + st[2]*c[2]]*=2;
    }
}


int main(void){
    int A[100];
    int s[] = {2,5,10};
    int st[] = {50,10,1};
    int c[] = {0,0,0};
    //Version 1.
    for(c[0] = 0; c[0] < s[0]; ++c[0])
        for(c[1] = 0; c[1] < s[1]; ++c[1])
            for(c[2] = 0; c[2] < s[2]; ++c[2])
                A[st[0]*c[0] + st[1]*c[1] + st[2]*c[2]]*=2;

    //Version 2 
    //(this should be fastest)
    int size = s[0]*s[1]*s[2];
    for(int i = 0; i < size; ++i) A[i] *= 2;

    //Version 3 (fail. so many function calls...)
    loop(0, 2, s, st, c, A);
    for(int i = 0; i < 100; ++i) std::cout << A[i];
}

GCC godbolt shows the recurive function loop makes lots of function calls.

Community
  • 1
  • 1
hamster on wheels
  • 2,771
  • 17
  • 50
  • To speed up a program, reduce the number of jumps, branches or function calls. These may cause the processor to waste time making decisions about whether or not to reload the instruction cache / pipeline. – Thomas Matthews Sep 27 '16 at 22:03
  • 2
    IMO, the issue is not the number of nested loop, but the random access of the array. You want to design your algorithms to make efficient use of the processor's data cache. Reloading the data cache wastes time. If your cache can only hold 16 slots, and you are accessing indices 0, 36, 78, 5, 24, 56, you are causing many reloads of the data cache. – Thomas Matthews Sep 27 '16 at 22:07
  • For filling arrays with constants, look at the `std::fill` function. This library function could be implemented by the compiler to use specialized efficient processor instructions. – Thomas Matthews Sep 27 '16 at 22:09
  • @ThomasMatthews The indexing in `A` in the nested loop is linear. A good optimizer collapses that into a linear walk of `A` without any subscript calculation. – 1201ProgramAlarm Sep 27 '16 at 23:09

0 Answers0