I have a function like this in C (in pseudo-ish code, dropping the unimportant parts):
int func(int s, int x, int* a, int* r) {
int i;
// do some stuff
for (i=0;i<a_really_big_int;++i) {
if (s) r[i] = x ^ i;
else r[i] = x ^ a[i];
// and maybe a couple other ways of computing r
// that are equally fast individually
}
// do some other stuff
}
This code gets called so much that this loop is actually a speed bottleneck in the code. I am wondering a couple things:
Since the switch
s
is a constant in the function, will good compilers optimize the loop so that the branch isn't slowing things down all the time?If not, what is a good way to optimize this code?
====
Here is an update with a fuller example:
int func(int s,
int start,int stop,int stride,
double *x,double *b,
int *a,int *flips,int *signs,int i_max,
double *c)
{
int i,k,st;
for (k=start; k<stop; k += stride) {
b[k] = 0;
for (i=0;i<i_max;++i) {
/* this is the code in question */
if (s) st = k^flips[i];
else st = a[k]^flips[i];
/* done with code in question */
b[k] += x[st] * (__builtin_popcount(st & signs[i])%2 ? -c[i] : c[i]);
}
}
}
EDIT 2:
In case anyone is curious, I ended up refactoring the code and hoisting the whole inner for loop (with i_max
) outside, making the really_big_int
loop be much simpler and hopefully easy to vectorize! (and also avoiding doing a bunch of extra logic a zillion times)