This has me baffled/intrigued, why is this code
void maxArray(double* x, double* y) {
for (int i = 0; i < 65536; i++) {
x[i] = ((y[i] > x[i]) ? y[i] : x[i]);
}
}
...faster than this code?
void maxArray(double* x, double* y) {
for (int i = 0; i < 65536; i++) {
if (y[i] > x[i]) x[i] = y[i];
}
}
and for the record the resulting assembly in the first one is identical to the expanded version:
inline double fn(double a, double b) {
if (a > b) {
return a;
} else {
return b;
}
}
void maxArray(double* x, double* y) {
for (int i = 0; i < 65536; i++) {
x[i] = fn(y[i], x[i]);
}
}
I get the difference. The first one is setting x[i]
to a condition, and the middle is conditionally setting x[i]
. Both have conditions though, so both have branches? Is it because the expanded if statement for the latter is optimized into the vector assembly max
command, and the former is, for some reason not recognized as a max function?
gcc 10.3 x86_64 -Ofast -march=native