For a nested for loop, can the case of unknown number of loops be as fast as the case of known number of loops?
Here are some old answers: variable nested for loops
Is BugMeNot2013's answer as fast as possible?
Here are my attempts. https://godbolt.org/g/W4KSlC
code:
#include <iostream>
inline void loop(int depth, int max_depth,
int* s, int* st, int* c, int* A){
//s = shape
//st = stride
//c = counter
if (depth != max_depth){
for(c[depth] = 0; c[depth] < s[depth]; ++c[depth]){
loop(depth+1, max_depth, s, st, c, A);
}
} else {
A[st[0]*c[0] + st[1]*c[1] + st[2]*c[2]]*=2;
}
}
int main(void){
int A[100];
int s[] = {2,5,10};
int st[] = {50,10,1};
int c[] = {0,0,0};
//Version 1.
for(c[0] = 0; c[0] < s[0]; ++c[0])
for(c[1] = 0; c[1] < s[1]; ++c[1])
for(c[2] = 0; c[2] < s[2]; ++c[2])
A[st[0]*c[0] + st[1]*c[1] + st[2]*c[2]]*=2;
//Version 2
//(this should be fastest)
int size = s[0]*s[1]*s[2];
for(int i = 0; i < size; ++i) A[i] *= 2;
//Version 3 (fail. so many function calls...)
loop(0, 2, s, st, c, A);
for(int i = 0; i < 100; ++i) std::cout << A[i];
}
GCC godbolt shows the recurive function loop makes lots of function calls.