The x86_64 mul
instruction can take two 32 bit integers and put the high and low 32 bits of the 64 bit result in RDX:RAX. However, gcc & clang can't seem to output that code. Instead, for the following source:
uint32_t mulhi32(uint32_t a, uint32_t b) {
return (a * (uint64_t) b) >> 32;
}
they output:
mov esi, esi
mov edi, edi
imul rdi, rsi
shr rdi, 32
mov rax, rdi
ret
which extends the two inputs to 64 bits, then does a 64*64 bit multiply producing a 128 bit result, then does a shift right on the lower 64 bits, and finally returns that. Madness!
The equivalent for 64 bits:
uint64_t mulhi64(uint64_t a, uint64_t b) {
return (a * (unsigned __int128) b) >> 64;
}
produces:
mov rax, rdi
mul rsi
mov rax, rdx
ret
Why don't gcc & clang do the equivalent thing for 32 bits?