6

I am working on code optimization and going through gcc internals. I wrote a simple expression in my program and I checked the gimple representation of that expression and I got stuck why gcc had done this. Say I have an expression :

if(i < 9)

then in the gimple representation it will be converted to

if(i <= 8)

I dont know why gcc do this. Is it some kind of optimization, if yes then can anyone tell me how it can optimize our program?

Eimantas
  • 48,927
  • 17
  • 132
  • 168
neel
  • 8,399
  • 7
  • 36
  • 50
  • I can't answer why gcc in particular would do this, but I know why I'd do it if I were writing a compiler. It's cleaner to convert < to <= at a source-to-source translation stage than it is to have code for compiling both < and <= to machine code when both operators should generate the same code. – Eric Finn Jul 11 '12 at 13:31
  • The Cyber compilers (in Pascal and Fortran) did this kind of optimization as well. I think they were more concerned with the constant `8` than `9`, as `8` is produced by shifting a `1` (a constant always held in a register), whereas a `9` needs several operations to generate or (gasp) fetch from a memory constant. I wonder: does it do the same thing for `if (i < 8)`? – wallyk Jul 11 '12 at 14:22
  • yes it will do the same thing for any expression like this – neel Jul 11 '12 at 15:32

2 Answers2

4

The canonalisation helps to detect CommonSubExpressions, such as:

#include <stdio.h>

int main(void)
{
unsigned u, pos;
char buff[40];

for (u=pos=0; u < 10; u++) {
        buff[pos++] = (u <5) ? 'A' + u : 'a' + u;
        buff[pos++] = (u <=4) ? '0' + u : 'A' + u;
        }
buff[pos++] = 0;
printf("=%s=\n", buff);
return 0;
}

GCC -O1 will compile this into:

         ...
        movl    $1, %edx
        movl    $65, %ecx
.L4:
        cmpl    $4, %eax
        ja      .L2
        movb    %cl, (%rsi)
        leal    48(%rax), %r8d
        jmp     .L3
.L2:
        leal    97(%rax), %edi
        movb    %dil, (%rsi)
        movl    %ecx, %r8d
.L3:
        mov     %edx, %edi
        movb    %r8b, (%rsp,%rdi)
        addl    $1, %eax
        addl    $1, %ecx
        addl    $2, %edx
        addq    $2, %rsi
        cmpl    $10, %eax
        jne     .L4
        movb    $0, 20(%rsp)
        movq    %rsp, %rdx
        movl    $.LC0, %esi
        movl    $1, %edi
        movl    $0, %eax
        call    __printf_chk
         ...

GCC -O2 will actually remove the entire loop and replace it by a stream of assignments.

wildplasser
  • 43,142
  • 8
  • 66
  • 109
0

Consider the following C code:

int i = 10;

if(i < 9) {
  puts("1234");
}

And also the equivalent C code:

int i = 10;

if(i <= 8) {
  puts("asdf");
}

Under no optimisation, both generate the exact same assembly sequence:

40052c:       c7 45 fc 0a 00 00 00    movl   $0xa,-0x4(%rbp)
400533:       83 7d fc 08             cmpl   $0x8,-0x4(%rbp)
400537:       7f 0a                   jg     400543 <main+0x1f>
400539:       bf 3c 06 40 00          mov    $0x40063c,%edi
40053e:       e8 d5 fe ff ff          callq  400418 <puts@plt>
400543:       .. .. .. ..             .. ..  ..

Since I am not familiar with the GCC implementation, I can only speculate as to why the conversion is done at all. Perhaps it makes the job of the code generator easier because it only has to handle a single case. I expect someone can come up with a more definitive answer.

Mike Kwan
  • 24,123
  • 12
  • 63
  • 96
  • 2
    Personally, I think it's useful for a compiler to always *canonicalize* code in ways like this. Otherwise, you run into cases where two seemingly-equivalent pieces of code lead to slightly different asm (possibly with different performance), and programmers who think they're a lot more clever than they actually are go comparing asm and writing the form of the code that generates the "better" asm rather than the form of the code that's more representative of what they mean to do. This leads to hideous obfuscated code. – R.. GitHub STOP HELPING ICE Jul 11 '12 at 13:08
  • There is very little information and too much text in this answer – anatolyg Jun 14 '15 at 17:05