clang will emit add eax, 0
at -O0
, but none of gcc, ICC, nor MSVC will. See below.
gcc -O0
doesn't mean "no optimization". gcc doesn't have a "braindead literal translation" mode where it tries to transliterate every component of every C expression directly to an asm instruction.
GCC's -O0
is not intended to be totally un-optimized. It's intended to be "compile-fast" and make debugging give the expected results (even if you modify C variables with a debugger, or jump to a different line within the function). So it spills / reloads everything around every C statement, assuming that memory can be asynchronously modified by a debugger stopped before such a block. (Interesting example of the consequences, and a more detailed explanation: Why does integer division by -1 (negative one) result in FPE?)
There isn't much demand for gcc -O0
to make even slower code (e.g. forgetting that 0
is the additive identity), so nobody has implemented an option for that. And it might even make gcc slower if that behaviour was optional. (Or maybe there is such an option but it's on by default even at -O0
, because it's fast, doesn't hurt debugging, and useful. Usually people like it when their debug builds run fast enough to be usable, especially for big or real-time projects.)
As @Basile Starynkevitch explains in Disable all optimization options in GCC, gcc always transforms through its internal representations on the way to making an executable. Just doing this at all results in some kinds of optimizations.
For example, even at -O0
, gcc's "divide by a constant" algorithm uses a fixed-point multiplicative inverse or a shift (for powers of 2) instead of an idiv
instruction. But clang -O0
will use idiv
for x /= 2
.
Clang's -O0
optimizes less than gcc's in this case, too:
void foo(void) {
volatile int num = 0;
num = num + 0;
}
asm output on Godbolt for x86-64
push rbp
mov rbp, rsp
# your asm block from the question, but with 0 instead of 10
mov dword ptr [rbp - 4], 0
mov eax, dword ptr [rbp - 4]
add eax, 0
mov dword ptr [rbp - 4], eax
pop rbp
ret
As you say, gcc leaves out the useless add eax,0
. ICC17 stores/reloads multiple times. MSVC is usually extremely literal in debug mode, but even it avoids emitting add eax,0
.
Clang is also the only one of the 4 x86 compilers on Godbolt that will use idiv
for return x/2;
. The others all SAR + CMOV or whatever to implement C's signed division semantics.