2
volatile int num = 0;
num = num + 10;

The above C++ Code seems to produce following code in intel assembly:

mov DWORD PTR [rbp-4], 0
mov eax, DWORD PTR [rbp-4]
add eax, 10
mov DWORD PTR [rbp-4], eax

If I change C++ code to

volatile int num = 0;
num = num + 0;

why will not compiler produce assembly code as:

mov DWORD PTR [rbp-4], 0
mov eax, DWORD PTR [rbp-4]
add eax, 0
mov DWORD PTR [rbp-4], eax

gcc7.2 -O0 leaves out the add eax, 0, but all the other instructions are the same (Godbolt).

At which part of compilation process does this kind trivial code gets removed. Is there any compiler flag which will make GCC compiler to not do these kind of optimizations.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
kris
  • 21
  • 1
  • @vu1p3n0x: I think kris is asking why gcc will just load `num` and store it again, without an `add eax, 0` instruction (because gcc does *does* optimize that part away even at `-O0`). – Peter Cordes Oct 09 '17 at 05:17
  • related, and sort of answers the question: [Disable all optimization options in GCC](https://stackoverflow.com/questions/33278757/disable-all-optimization-options-in-gcc). gcc doesn't have a "totally dumb" mode. It always transforms through its internal representations on the way to making an executable. – Peter Cordes Oct 09 '17 at 05:19
  • A guess is that it never "gets removed" but actually never gets added. A compiler doesn't have to work *that hard* to realize that `x + 0` requires no code. – Bo Persson Oct 09 '17 at 10:51
  • If you just want to reserve the immediate byte in machine code for further patching, it's not clear why don't you go with the `10` version and patch that to `0` by default, or other `-128..+127` value as needed. (Also note, that in case you are aiming for binary patching, you may want to use some 32b or 64b constant to get the `add` encoded with big enough immediate, as values in the `-128..+127` range will use only imm8 encoding (single byte for value) ... at least with any common assembler (used by gcc)). – Ped7g Oct 09 '17 at 14:46

2 Answers2

3

clang will emit add eax, 0 at -O0, but none of gcc, ICC, nor MSVC will. See below.


gcc -O0 doesn't mean "no optimization". gcc doesn't have a "braindead literal translation" mode where it tries to transliterate every component of every C expression directly to an asm instruction.

GCC's -O0 is not intended to be totally un-optimized. It's intended to be "compile-fast" and make debugging give the expected results (even if you modify C variables with a debugger, or jump to a different line within the function). So it spills / reloads everything around every C statement, assuming that memory can be asynchronously modified by a debugger stopped before such a block. (Interesting example of the consequences, and a more detailed explanation: Why does integer division by -1 (negative one) result in FPE?)


There isn't much demand for gcc -O0 to make even slower code (e.g. forgetting that 0 is the additive identity), so nobody has implemented an option for that. And it might even make gcc slower if that behaviour was optional. (Or maybe there is such an option but it's on by default even at -O0, because it's fast, doesn't hurt debugging, and useful. Usually people like it when their debug builds run fast enough to be usable, especially for big or real-time projects.)

As @Basile Starynkevitch explains in Disable all optimization options in GCC, gcc always transforms through its internal representations on the way to making an executable. Just doing this at all results in some kinds of optimizations.

For example, even at -O0, gcc's "divide by a constant" algorithm uses a fixed-point multiplicative inverse or a shift (for powers of 2) instead of an idiv instruction. But clang -O0 will use idiv for x /= 2.


Clang's -O0 optimizes less than gcc's in this case, too:

void foo(void) {
    volatile int num = 0;
    num = num + 0;
}

asm output on Godbolt for x86-64

    push    rbp
    mov     rbp, rsp

    # your asm block from the question, but with 0 instead of 10
    mov     dword ptr [rbp - 4], 0
    mov     eax, dword ptr [rbp - 4]
    add     eax, 0
    mov     dword ptr [rbp - 4], eax

    pop     rbp
    ret

As you say, gcc leaves out the useless add eax,0. ICC17 stores/reloads multiple times. MSVC is usually extremely literal in debug mode, but even it avoids emitting add eax,0.

Clang is also the only one of the 4 x86 compilers on Godbolt that will use idiv for return x/2;. The others all SAR + CMOV or whatever to implement C's signed division semantics.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Funnily enough, you don't explicitly answer the *"At which part of compilation process"* question, although implicitly it's sort of obvious it happens as one of the internal steps when the parsed text is transformed into internal representation of "what the gcc uses to produce final machine code". Can't even name it, tempted to use "C++ abstract machine", but the internal representation used by gcc is for sure having lot more attributes and features than pure C++ abstract machine, and I don't know whether gcc developers have some name for it. – Ped7g Oct 09 '17 at 14:41
  • 1
    @Ped7g: One of GCC's internal representations is called GIMPLE. Another is register-transfer language (RTL), I think. IDK *when* exactly it happens. You can dump the internal rep at various steps, but I'm not an expert on that. – Peter Cordes Oct 09 '17 at 18:26
0

As per the "as if" rule in C++, an implementation is freely allowed to do whatever it wants, provided that the observable behaviour matches the standard. Specifically, in C++17, 4.6/1 (as one example):

... conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

This provision is sometimes called the "as-if" rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program.

For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced.

As to how to control gcc, my first suggestion would be to turn off all optimisation by using the -O0 flag. You can get more fine-tuned control by using various -f<blah> options but -O0 should be a good start.

Community
  • 1
  • 1
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953