Consider such a C program
int fn(int n){
int sum = 0;
for(int i = 1; i <= n; i++)
sum += i*i;
return sum;
}
int main(){
int n;
scanf("%d",&n);
printf("%d\n",fn(n));
return 0;
}
In this program, I use a loop to calculate the faulhaber's formula.
In math, we know that for any k-power sum, we have a formula to calculate it in O(1) time.
However, compilers can find this is a faulhaber's formula and calculate it very efficently. The above program can be translated into IR with O2 like this
define dso_local i32 @fn(i32 %0) local_unnamed_addr #0 {
%2 = icmp slt i32 %0, 1
br i1 %2, label %22, label %3
3: ; preds = %1
%4 = add nsw i32 %0, -1
%5 = zext i32 %4 to i33
%6 = add nsw i32 %0, -2
%7 = zext i32 %6 to i33
%8 = mul i33 %5, %7
%9 = add nsw i32 %0, -3
%10 = zext i32 %9 to i33
%11 = mul i33 %8, %10
%12 = lshr i33 %11, 1
%13 = trunc i33 %12 to i32
%14 = mul i32 %13, 1431655766
%15 = lshr i33 %8, 1
%16 = trunc i33 %15 to i32
%17 = mul i32 %16, 5
%18 = add i32 %14, %17
%19 = shl i32 %0, 2
%20 = add i32 %18, %19
%21 = add i32 %20, -3
br label %22
22: ; preds = %3, %1
%23 = phi i32 [ 0, %1 ], [ %21, %3 ]
ret i32 %23
}
The translated program costs nearly no time to calculate the sum, even though I use i*i*i
or something like this, compiler can also do that.
I am interesting to the optimizations, but I don't know how Clang/gcc implement the above problem.