Why does GCC treat two similar loops differently?

Question

Note:

If I understand the following code right, it will skip the whole loop, because when compare unsigned (j) and signed (-1), it seems that the -1 will be convert to UINT_MAX. (like this question explained)

The first loop:

unsigned int j = 10;

for (; j > -1; --j) {     --->  `>`
    printf("%u", j);
}

Part of assembly code of first loop:

movq    %rsp, %rbp
.cfi_def_cfa_register 6
movl    %edi, -20(%rbp)
movq    %rsi, -32(%rbp)
movl    $10, -4(%rbp)
nop                           --->**elision**
popq    %rbp
.cfi_def_cfa 7, 8
ret

The second loop of second loop:

unsigned int j = 10;

for (; j >= -1; --j) {  --->  `>=`
    printf("%u", j);
}

Part of assembly code:

movq    %rsp, %rbp
.cfi_def_cfa_register 6
subq    $32, %rsp
movl    %edi, -20(%rbp)
movq    %rsi, -32(%rbp)
movl    $10, -4(%rbp)
jmp .L2                        --->** still a loop here **

.L3:

movl    -4(%rbp), %eax
movl    %eax, %esi
movl    $.LC0, %edi
movl    $0, %eax
call    printf
subl    $1, -4(%rbp)

.L2:

cmpl    $-1, -4(%rbp)
je  .L3
leave
.cfi_def_cfa 7, 8
ret

So my question is

Why the gcc (I use GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2) treat a similar situation, it optimize the first one but didn't the second? (* Or my understand of the code is wrong?) (it has something to do with the assembly?)

Edit: You can go to this site to check. If you just use -S compiler option(or no compiler option) you will get the same result as I do. (Thanks @Raymond Chen for reminder)

step 1:

Open above site and copy the following code to Code Eidtor.

 #include <stdio.h>
 int main (int argc, char *argv[]) {

   unsigned int j = 10;

   for (; j > -1; --j) {    
      printf("%u", j);
   }
 }

step 2:

Choose g++ 4.8 as compiler. Compiler option is empty.(or -S)

step 3:

You get the first situation. Now, change the j > -1 to j >= -1 and you can see the second.

Both optimizations don't look that good: The first loop should be optimized away, the second to one single conditional execution, to even less if j can be statically compared to -1. — Deduplicator, Aug 02 '14 at 13:57
@Deduplicator I cannot find a gcc optimization level that results in the above output. At -O0, both loops are generated. At -O1, both loops are optimized out. The only way I can get the output above is to make `j` a global variable. — Raymond Chen, Aug 02 '14 at 13:59
@RaymondChen: That does not surprise me. Maybe my comment was a bit unclear though... — Deduplicator, Aug 02 '14 at 14:01
@Tony: Is `j` global? And yet another reason for compilable examples in questions… — mafso, Aug 02 '14 at 16:28
http://gcc.godbolt.org/ `#include int main(){unsigned int j=10;for (; j > -1; --j) {printf("%u",j);}}` When compiled at `-O0` both loops are generated; at `-O1` neither is generated. — Raymond Chen, Aug 02 '14 at 16:37

david.pfx · Accepted Answer · 2014-08-03T00:32:34.437

4

The applicable conversion is described in the C standard n1570 S6.3.1.3 as follows:

...if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

So -1 is converted to UINT_MAX which for 32-bit arithmetic is 0xffffffff. This is the same bit pattern, so in assembly language terms it's a no-op.

In the first case the compiler can establish that the loop exit condition is trivially true for all values of the loop variable. No further analysis is needed and at a suitable level of optimisation the loop should be elided.

In the second case there is no such trivial analysis available. However, if the compiler performs a data flow analysis it will discover that the loop exit condition is satisfied before the loop is entered. At a suitable (but perhaps different) level of optimisation this loop can also be elided.

The analysis required is different in each case and harder in the second case. However, I would not care to predict which compilers would perform loop elision in which cases. You would have to test them to find out (as you did).

A note on terminology: the term elision is a more precise description when a compiler decides to omit code entirely. The term optimisation is better used when the compiler makes choices between different possible code generation strategies, perhaps choosing between speed and space.

edited Aug 03 '14 at 00:32

answered Aug 02 '14 at 14:22

david.pfx

10,520
3
30
63

Just a minor correction, `-1U` is equal to `UINT_MAX` and not one smaller. – Jens Gustedt Aug 02 '14 at 16:39
So obvious (after the event). Fixed. Thanks. – david.pfx Aug 03 '14 at 00:32
Could you explain 'same bit pattern, so in assembly language terms it's a no-op' to me? I didn't get you. – Tony Aug 04 '14 at 11:18
The bit pattern with all bits set (eg '0xffffffff') is the signed value `-1` and the unsigned value UINT_MAX so converting one to the other requires no action by the processor. I think you have some reading to do. – david.pfx Aug 04 '14 at 11:24
Sorry, do you mean the nop is for converting? Shouldn't it for loop elision? Any proof? – Tony Aug 04 '14 at 14:42
Now I think you're confusing the NOP instruction emitted by the compiler (which has no obvious purpose) with my no-op terminology, which just means the processor has nothing to do. You don't _need_ to execute a NOP instruction in order to do nothing. – david.pfx Aug 05 '14 at 02:01

davmac · Answer 2 · 2014-08-02T16:59:26.237

2

Probably because 'j > -1' cannot be true for any value of j, whereas 'j >= -1' can be true if j == UINT_MAX. So there is a subtle difference which affects optimisation. In the first case the condition and therefore the loop can be trivially optimised away; in the second case, a slightly more complex analysis is required.

edited Aug 02 '14 at 16:59

answered Aug 02 '14 at 14:04

davmac

20,150
1
40
68

it should be `j == UINTMAX`, no minus one in there. – Jens Gustedt Aug 02 '14 at 16:41

AnT stands with Russia · Answer 3 · 2014-08-06T15:09:48.280

It looks like the compiler did not attempt to take into account the "known" value of j from the previous initialization. It simply interpreted the cycles independently, under assumption that the initial value of j can be anything.

Under such circumstances these two cycles are not even remotely similar.

The first cycle is an "impossible" cycle. It contains iteration condition j > -1, which will be interpreted as j > UINT_MAX. Obviously, this is an impossible condition. So, the compiler decided to completely eliminate the cycle.

The second cycle's condition is not impossible. It is equivalent to j >= UINT_MAX and it is perfectly satisfiable for j == UINT_MAX. So, the second cycle is straightforwardly translated into a perfectly normal code that calls your printf.

Obviously, the cycle in the second version will never make more than one iteration, meaning that there's no real need for an actual cycle. By this is not something the compiler was able (or even tried) to figure out by itself. You asked for a cycle - it gave you a cycle.

Why does GCC treat two similar loops differently?

Note:

step 1:

step 2:

step 3:

3 Answers3

Linked