I was curious whether the compilers do the obvious optimization for code like N % <some factor of 2> equals / not equals 0
. Indeed they do, but there are some interesting nuances, so here are two questions:
- Why do GCC and MSVC produce seemingly more complex output than clang for the case of
i % 2
(https://godbolt.org/z/KaWrYoz1a), but equal - simpler - output for the factors of 4, 8, 16?
clang output (expected):
test dil, 1
sete al
GCC output:
mov rax, rdi
not rax
and eax, 1
MSVC is the same, but different order between mov
/ not
.
- Why does MSVC produce much more complex code for signed integers (both 32 and 64-bit) for factors of 4 / 8 (https://godbolt.org/z/xP1qcch8q)? Clang and GCC are unaffected by signedness.
mov rax, rcx
cdq
and edx, 3
add rax, rdx
and eax, 3
cmp rax, rdx
sete al
And why does MSVC still produce the simple code for the factor of 2?
P. S. The compilers are at their maximum O level, but I am not specifying any architecture (-march
) flags.