I have written the following very simple code which I am experimenting with in godbolt's compiler explorer:
#include <cstdint>
uint64_t func(uint64_t num, uint64_t den)
{
return num / den;
}
GCC produces the following output, which I would expect:
func(unsigned long, unsigned long):
mov rax, rdi
xor edx, edx
div rsi
ret
However Clang 13.0.0 produces the following, involving shifts and a jump even:
func(unsigned long, unsigned long): # @func(unsigned long, unsigned long)
mov rax, rdi
mov rcx, rdi
or rcx, rsi
shr rcx, 32
je .LBB0_1
xor edx, edx
div rsi
ret
.LBB0_1:
xor edx, edx
div esi
ret
When using uint32_t, clang's output is once again "simple" and what I would expect.
It seems this might be some sort of optimization, since clang 10.0.1 produces the same output as GCC, however I cannot understand what is happening. Why is clang producing this longer assembly?