I have this function
long long int divideBy10(long long int a){
return a / 10;
}
it's compiled to:
mov rax, rdi
movabs rcx, 7378697629483820647
imul rcx
mov rax, rdx
shr rax, 63
sar rdx, 2
add rax, rdx
ret
if I add __builtin_assume(a > 0);
it's compiled to
mov rax, rdi
movabs rcx, -3689348814741910323
mul rcx
mov rax, rdx
shr rax, 3
ret
The code is more efficient because it doesn't have to worry about negative signs. Now if I add __builtin_assume(a < 10000); I would've expected it to be compiled to one multiplication without shifts. but that's not the case.
I thought maybe the compiler only tracks if the number is positive or negative but
long long int noBranch(long long int a){
__builtin_assume(a < 400);
if( a < 500){
return a;
}
return 0;
}
compiles to just a move instruction so the compiler is able to track these bounds.
Why are compilers not able to optimize the shift away? (this happens both in GCC and Clang).
This is the code, that I expected to be produced:
long long int d10(long long int a){
__builtin_assume(a > 0);
__builtin_assume(a < 10'000);
__uint128_t e = a;
e *= 0x199999999999999A;
return e >> 64;
}
d10(long long): # @d10(long long)
mov rax, rdi
movabs rcx, 1844674407370955162
mul rcx
mov rax, rdx
ret
Edit: I'm aware that you need a shift to be able to cover the whole range of long long int. I'm asking why can't compiler optimize the shift away if the range is restricted.