This is redundant, and not something you'd see in optimized asm.
Even if the cmp/ja
was a possible jump target from somewhere else, existing optimizing compilers like GCC, clang, MSVC, and ICC would (I'm pretty sure) do a jmp
or different code layout instead of letting execution fall into a conditional branch that would always be false. The optimizer would know there doesn't need to be a conditional branch along this path of execution, so would make sure it didn't encounter one. (Even if that cost an additional jmp
.)
That's probably a good choice, even in the hypothetical case where some code-size saving was possible this way, because you don't want to pollute / dilute branch-prediction history with unnecessary conditional branches, and the branch could mispredict as taken.
But in debug mode, some compilers are more able to switch off their brains than others for optimizations within a single statement or expression. (Across statements they'd always spill/reload vars to memory, unless you used register int foo;
)
I was able to trick clang -O0
and MSVC into emitting that exact sequence of instructions. And also something like that but worse from GCC.
(Surprising because gcc -O0
still does some optimizations inside a single expression like using a multiplicative inverse for x /= 10;
, and dead code removal for if(false)
. vs. MSVC actually putting a 0 in a register and testing that it's 0.)
void dummy();
unsigned char set_al();
int foo(void) {
if ((set_al() & 3) <= 3U)
dummy();
return 0;
}
clang12.0 for x86-64 Linux (on Godbolt)
push rbp
mov rbp, rsp
call set_al()
movzx eax, al # The redundant sequence
and eax, 3
cmp eax, 3
ja .LBB0_2
call dummy()
.LBB0_2:
xor eax, eax
pop rbp
ret
MSVC contained the same sequence. GCC10.3 was similar but worse, materializing a boolean in a register and test
ing it. (Both also in the same Godbolt link)
## GCC10.3
... set up RBP as a frame pointer
movzx eax, al # The redundant sequence
and eax, 3
cmp eax, 3
setbe al
test al, al # even worse than just jnbe
je .L2
call dummy()
.L2:
mov eax, 0
pop rbp
ret
With the char
coming from memory instead of a return value, GCC does optimize away the compare even in debug mode:
int bar(unsigned char *p) {
if ((*p & 3) <= 3U)
dummy();
return 0;
}
# GCC 10.3 -O0
bar(unsigned char*):
push rbp
mov rbp, rsp
sub rsp, 16 # space to spill the function arg
mov QWORD PTR [rbp-8], rdi
call dummy() # unconditional call
mov eax, 0
leave
ret
clang and MSVC do the test, both with asm like
#MSVC19.28 (VS16.9) default options (debug mode)
...
movzx eax, BYTE PTR [rax]
and eax, 3
cmp eax, 3
ja SHORT $LN2@bar
...