5

I wanted to look into how certain C/C++ features were translated into assembly and I created the following file:

struct foo {
    int x;
    char y[0];
};

char *bar(struct foo *f)
{
    return f->y;
}

I then compiled this with gcc -S (and also tried with g++ -S) but when I looked at the assembly code, I was disappointed to find a trivial redundancy in the bar function that I thought gcc should be able to optimize away:

_bar:
Leh_func_begin1:
        pushq   %rbp
Ltmp0:
        movq    %rsp, %rbp
Ltmp1:
        movq    %rdi, -8(%rbp)
        movq    -8(%rbp), %rax
        movabsq $4, %rcx
        addq    %rcx, %rax
        movq    %rax, -24(%rbp)
        movq    -24(%rbp), %rax
        movq    %rax, -16(%rbp)
        movq    -16(%rbp), %rax
        popq    %rbp
        ret
Leh_func_end1:

Among other things, the lines

        movq    %rax, -24(%rbp)
        movq    -24(%rbp), %rax
        movq    %rax, -16(%rbp)
        movq    -16(%rbp), %rax

seem pointlessly redundant. Is there any reason gcc (and possibly other compilers) cannot/does not optimize this away?

Matt
  • 21,026
  • 18
  • 63
  • 115
  • 1
    Please run gcc with -O switch to enable standard optimizations. – Dima Chubarov May 19 '12 at 06:47
  • which version of gcc are you using? – Dima Chubarov May 19 '12 at 06:50
  • More recent duplicate [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394) which goes into more detail than the answers here about why some of this makes sense for consistent debugging. (The basic point is the same, the answers here are correct.) Related: [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) – Peter Cordes May 16 '22 at 22:04

2 Answers2

12

I thought gcc should be able to optimize away.

From the gcc manual:

Without any optimization option, the compiler's goal is to reduce the cost of compilation and to make debugging produce the expected results.

In other words, it doesn't optimize unless you ask it to. When I turn on optimizations using the -O3 flag, gcc 4.4.6 produces much more efficient code:

bar:
.LFB0:
        .cfi_startproc
        leaq    4(%rdi), %rax
        ret
        .cfi_endproc

For more details, see Options That Control Optimization in the manual.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Oh, I assumed standard optimizations would be on by default. Why aren't they? – Matt May 19 '12 at 06:56
  • 7
    @Matt: To quote the manual, "Without any optimization option, the compiler's goal is to reduce the cost of compilation and to make debugging produce the expected results." – NPE May 19 '12 at 06:59
  • 1
    @Matt And because the implementors so chose. Unless you score an answer from one of them here it's a futile question. – user207421 May 19 '12 at 16:18
8

The code the compiler generates without optimization is typically a straight instruction-by-instruction translation, and the instructions are not those of the program but those of an intermediate representation in which redundancy may have been introduced.

If you expect assembly without such redundant instructions, use gcc -O -S

The kind of optimization you were expecting is called peephole optimization. Compilers usually have plenty of these, because unlike more global optimizations, they are cheap to apply and (generally) do not risk making the code worse—if applied towards the end of the compilation, at least.

In this blog post, I provide an example where both GCC and Clang may go as far as generating shorter 32-bit instructions when the integer type in the source code is 64-bit but only the lowest 32-bit of the result matter.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281