void start_here(unsigned int n) {
if (n > 0)
start_here(n - 1);
}
start_here(2019);
Then you write:
double pow(double x, unsigned int exp) {
if (exp > 0)
return x * pow(x, exp - 1);
else
return 1;
}
Then you improve:
double pow(double x, unsigned int exp) {
if (exp > 0) {
const auto p2 = pow(x, exp / 2);
return p2 * p2 * ((exp & 1) ? x : 1);
}
else
return 1;
}
The last algorithm is known as binary exponentiation.
And finally you learn templates:
template<unsigned int exp>
constexpr double pow(double x) {
if constexpr (exp > 0) {
const auto p2 = pow<exp / 2>(x);
return p2 * p2 * ((exp & 1) ? x : 1);
}
else
return 1;
}
Edit. Tail recursion optimization.
Let's take a look at the assembly code generated for the first version of
pow()
without optimizations (-O0
):
pow(double, unsigned int):
push rbp
mov rbp, rsp
sub rsp, 16
movsd QWORD PTR [rbp-8], xmm0
mov DWORD PTR [rbp-12], edi
cmp DWORD PTR [rbp-12], 0
je .L2
mov eax, DWORD PTR [rbp-12]
lea edx, [rax-1]
mov rax, QWORD PTR [rbp-8]
mov edi, edx
movq xmm0, rax
call pow(double, unsigned int)
mulsd xmm0, QWORD PTR [rbp-8]
jmp .L3
.L2:
movsd xmm0, QWORD PTR .LC0[rip]
.L3:
leave
ret
.LC0:
.long 0
.long 1072693248
We see a recursive call pow(double, unsigned int)
.
Now let's add some optimizations (-O2 -ffast-math
):
pow(double, unsigned int):
movapd xmm1, xmm0
movsd xmm0, QWORD PTR .LC0[rip]
test edi, edi
je .L4
.L3:
mulsd xmm0, xmm1
sub edi, 1
jne .L3
ret
.L4:
ret
.LC0:
.long 0
.long 1072693248
Where is the recursive call? It's gone! The compiler employs the tail call optimization and transforms a recursive call into a simple loop. This assembly code is equivalent to this C++ one:
double pow(double x, unsigned int exp) {
double p = 1;
if (exp == 0)
return p;
loop:
p *= x;
if (--exp > 0)
goto loop;
return p;
}
This optimization is not possible without -ffast-math
option due to non-associativity of floating point multiplication.
And finally, 1.d
's is represented in memory by 8 bytes:
3F F0 00 00 | 00 00 00 00 (base 16)
After conversion into two long
numbers, they become:
1072693248 | 0 (base 10)
These are two magic numbers that can be spotted in the assembly code.