Let N
be an a compile time unsigned integer.
GCC can optimize
unsigned sum = 0;
for(unsigned i=0; i<N; i++) sum += a; // a is an unsigned integer
to simply a*N
. This can be understood since modular arithmetic says (a%k + b%k)%k = (a+b)%k
.
However GCC will not optimize
float sum = 0;
for(unsigned i=0; i<N; i++) sum += a; // a is a float
to a*(float)N
.
But by using associative math with e.g. -Ofast
I discovered that GCC can reduce this in order log2(N)
steps. E.g for N=8
it can do the sum in three additions.
sum = a + a
sum = sum + sum // (a + a) + (a + a)
sum = sum + sum // ((a + a) + (a + a)) + ((a + a) + (a + a))
Though some point after N=16
GCC goes back to doing N-1
sums.
My question is why does GCC not do a*(float)N
with -Ofast
?
Instead of being O(N)
or O(Log(N))
it could be simply O(1)
. Since N
is known at compile time it's possible to determine if N
fits in a float. And even if N
is too large for a float it could do sum =a*(float)(N & 0x0000ffff) + a*(float)(N & ffff0000)
. In fact, I did a little test to check the accuracy and a*(float)N
is more accurate anyway (see the code and results below).
//gcc -O3 foo.c
//don't use -Ofast or -ffast-math or -fassociative-math
#include <stdio.h>
float sumf(float a, int n)
{
float sum = 0;
for(int i=0; i<n; i++) sum += a;
return sum;
}
float sumf_kahan(float a, int n)
{
float sum = 0;
float c = 0;
for(int i=0; i<n; i++) {
float y = a - c;
float t = sum + y;
c = (t -sum) - y;
sum = t;
}
return sum;
}
float mulf(float a, int n)
{
return a*n;
}
int main(void)
{
int n = 1<<24;
float a = 3.14159;
float t1 = sumf(a,n);
float t2 = sumf_kahan(a,n);
float t3 = mulf(a,n);
printf("%f %f %f\n",t1,t2,t3);
}
The result is 61848396.000000 52707136.000000 52707136.000000
which shows that multiplication and the Kahan summation have the same result which I think shows that the multiplication is more accurate than the simple sum.