5

Does gcc/g++ have flags to enable or disable arithmetic optimisations, e.g. where a+a+...+a is replaced by n*a when a is an integer? In particular, can this be disabled when using -O2 or -O3?

In the example below even with -O0 the add operations are replaced by a single multiplication:

$ cat add1.cpp
unsigned int multiply_by_22(unsigned int a)
{
    return a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a;
}

$ g++ -S -masm=intel -O0 add1.cpp

$ cat add1.s
...
        imul    eax, edi, 22

Even disabling all the flags used in -O0 (see g++ -c -Q -O0 --help=optimizers | grep enabled) still produces the imul operation.

When adding loops, it requires -O1 to simplify the repeated addition to a single multiplication:

$ cat add2.cpp
unsigned int multiply(unsigned int a, unsigned int b)
{
    unsigned int sum=0;
    for(unsigned int i=0; i<b; i++)
        sum += a;
    return sum;
}

$ g++ -S -masm=intel -O1 add2.cpp

$ cat add2.s
...
        mov     eax, 0
.L3:
        add     eax, 1
        cmp     esi, eax
        jne     .L3
        imul    eax, edi
        ret

I.e. -O1 has moved the sum += a; outside the loop and replaced it by a single multiplication. With -O2 it will also remove the dead loop.

I'm just asking out of interest as I was trying to time some basic integer operations and noticed that the compiler optimised my loops away and I couldn't find any flags to disable this.

phuclv
  • 37,963
  • 15
  • 156
  • 475
user1059432
  • 321
  • 2
  • 5
  • I was searching for some pragmas that might force the behaviour you are talking about, but I only found https://stackoverflow.com/questions/2219829/how-to-prevent-gcc-optimizing-some-statements-in-c that sounded good... but all answers explain how to locally set `-O0`, and we know it is not enough for you. – Roberto Caboni Dec 08 '19 at 20:41
  • Mostly there is no such flag. You can modify the code (add volatile, put the operations in separate statements). For signed type, the undefined sanitizer might also prevent some optimizations. – Marc Glisse Dec 08 '19 at 20:41
  • Maybe `asm volatile("" : "=r"(a) : : "memory");`? – S.S. Anne Dec 08 '19 at 20:50
  • Alternatively you could try with `asm` keyword supported by gcc (look for documentation online) and write an explicit assembler section for your sum. I'm not sure about it, and that's the reason why I cannot write an answer about it, but I'm confident it could work. Since I'm not an Intel asm expert, I would start writing a simple a+b sum program, open the asm and extend it in order to have the a*22 sum. Then I would put it into the asm section. – Roberto Caboni Dec 08 '19 at 20:53
  • Interestingly enough GCC 4.1.2 seems to produce exactly what you expect: https://godbolt.org/z/8QmFFv Though changing compiler version might not be relevant at all. – koitimes3 Dec 08 '19 at 20:53
  • 1
    Why would you want to time something that never happens in real code? – n. m. could be an AI Dec 09 '19 at 10:06
  • 1
    @n.'pronouns'm. purely to time `add`, `imul`, `idiv` operations etc, and I know there are some good manuals online like https://www.agner.org/optimize/instruction_tables.pdf but it's always nice to replicate locally :) – user1059432 Dec 09 '19 at 11:44
  • 1
    If you want to time certain assembly instructions, write those exact assembly instructions in assembly. – n. m. could be an AI Dec 09 '19 at 12:19

2 Answers2

5

I do not know such compiler flag.

Maybe you can try to use volatile as a substitute:

unsigned int multiply_by_22(volatile unsigned int a)
{
    return a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a;
}

with -O0 you get:

push    rbp
mov     rbp, rsp
mov     DWORD PTR [rbp-4], edi
mov     edx, DWORD PTR [rbp-4]
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax

etc...

For -O2 or -O3 generated code, you can visit: https://godbolt.org/z/Bk2b6Z

Picaud Vincent
  • 10,518
  • 5
  • 31
  • 70
  • `volatile` certainly works but has the disadvantage that even with `-O2` it won't be stored in the cpu registers and always requires memory access. – user1059432 Dec 08 '19 at 22:28
  • @user1059432 Yes, I do agree with that. – Picaud Vincent Dec 08 '19 at 22:30
  • Note that `clang` does not require `volatile` to disable the multiplication at `-O0`, but still stores `edi` to memory `[ebp-4]` to perform the additions. At `-O1`, it optimises the addition without an `imul` instruction, with 2 `lea` and an `add`. The code generated by `gcc` at `-O1` and `-O2` with the `volatile` is horrible. – chqrlie Dec 09 '19 at 00:47
  • 1
    @user1059432 why are you worrying about the value not being stored in register when you're trying to pessimize the output binary? – phuclv Dec 09 '19 at 10:38
  • @phuclv because I'm interested in the timing of e.g. `add` and slow memory access like `mov eax, DWORD PTR [rbp-4]` will distort this. – user1059432 Dec 09 '19 at 11:46
  • 1
    @user1059432 in that case this is about micro-benchmarking and not about compiler optimzation. Assembly or inline assembly is the the solution for that – phuclv Dec 09 '19 at 13:18
2

In the absence of any compiler flags I see only two options to enforce add:

  • Write a more complex series of additions the compiler can't optimise away, e.g. Fibbonaci series (although this will overrun quickly):
$ cat fibonacci.cpp
unsigned int fibonacci(unsigned int ops)
{
    unsigned int a=1;
    unsigned int b=1;
    for(unsigned int i=0; i<ops/2; i++) {
        a+=b;
        b+=a;
    }
    return b;
}

$ g++ -Wall -S -masm=intel -O3 --unroll-loops fibonacci.cpp

$ cat fibonacci.s
...
.L3:
        add     edx, eax
        add     ecx, 8
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        cmp     ecx, edi
        jne     .L3
  • Write an assembly routine which emits add operations:
unsigned int multiply_by_5(unsigned int a)
{
   unsigned int sum = 0;
   asm ( "# start multiply_by_5\n\t"
         "movl %1, %%ebx\n\t"           // ebx = a
         "movl $0, %%eax\n\t"           // eax = 0 (sum = 0)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "movl %%eax, %0\n\t"           // sum = eax
         "# end multiply_by_5\n"
         : "=m" (sum) : "m" (a) : "%eax", "%ebx");
   return sum;
}
user1059432
  • 321
  • 2
  • 5