I've seen that gcc does a great job in optimizing integer divisions with a constant divisor.
E.g.
uint64_t f(uint64_t x)
{
return x / 1000;
}
uint64_t f2(uint64_t x)
{
return x / 10000;
}
becomes:
f:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
shr rax, 3
movabs rdx, 2361183241434822607
mul rdx
mov rax, rdx
shr rax, 4
pop rbp
ret
f2:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
movabs rdx, 3777893186295716171
mul rdx
mov rax, rdx
shr rax, 11
pop rbp
ret
It seems all divisions can be reduced to a shift + multiplication + shift (not always are all of those operations included).
How is this obtained and how can I get manually the required numbers for those three operations? I specifically need to know this for uint64_t
,
Thanks a lot