168

Compiling this:

#include <iostream>

int main()
{
    for (int i = 0; i < 4; ++i)
        std::cout << i*1000000000 << std::endl;
}

and gcc produces the following warning:

warning: iteration 3u invokes undefined behavior [-Waggressive-loop-optimizations]
   std::cout << i*1000000000 << std::endl;
                  ^

I understand there is a signed integer overflow.

What I cannot get is why i value is broken by that overflow operation?

I've read the answers to Why does integer overflow on x86 with GCC cause an infinite loop?, but I'm still not clear on why this happens - I get that "undefined" means "anything can happen", but what's the underlying cause of this specific behavior?

Online: http://ideone.com/dMrRKR

Compiler: gcc (4.8)

Community
  • 1
  • 1
zerkms
  • 249,484
  • 69
  • 436
  • 539
  • 52
    Signed integer overflow => Undefined Behaviour => Nasal Daemons. But I have to admit, that example is quite nice. – dyp Jun 18 '14 at 23:29
  • funny enough, it works on my side (both g++4.9 and clang (llvm 5.1)) on OS X. but this is indeed a nicely picked ub example... – vsoftco Jun 18 '14 at 23:31
  • @ghostofstandardspast: I hope it will be soon at http://stackoverflow.com/questions/24296571/why-this-code-outputs-more-than-4-lines#comment37546453_24296605 :-) – zerkms Jun 18 '14 at 23:35
  • 1
    Assembly output: http://goo.gl/TtPmZn – Bryan Chen Jun 18 '14 at 23:38
  • 1
    Happens on GCC 4.8 with the `O2`, and `O3` flag, but not `O0` or `O1` – Alex Jun 18 '14 at 23:42
  • The compiler optimizes out the multiplication, which is perfectly legal (and good) for it to do. But because the code has a bug (signed integer overflow is a bug), this optimization changes its behavior. – David Schwartz Jun 19 '14 at 01:13
  • @David Schwartz: as it is mentioned here: http://stackoverflow.com/a/24297220/251311 it's not that easy. And it's easy to check http://ideone.com/TTLXha, without `std::endl` this optimization is not performed. – zerkms Jun 19 '14 at 01:14
  • 3
    @dyp when I read Nasal Daemons I did the "imgur laugh" which consists of breathing out your nose slightly when you see something funny. And then I realized... I must be cursed by a Nasal Daemon! – corsiKa Jun 19 '14 at 02:07
  • @dyp John Regher explains this behavior, I linked his article in my answer. – Shafik Yaghmour Jun 19 '14 at 02:27
  • 1
    Using `-fno-aggressive-loop-optimizations` will stop the aggressive loop optimization, this is documented "feature" as my answer explains. – Shafik Yaghmour Jun 19 '14 at 02:54
  • I can't reproduce on LLVM, what version and flags are you using? – Shafik Yaghmour Jun 19 '14 at 03:25
  • 1
    @Shafik Yaghmour: it was a mistake - I misinterpreted a message from a friend of mine and thought it is reproducible on llvm, while it's not. Removed to avoid further confusion, sorry. – zerkms Jun 19 '14 at 03:38
  • 1
    possible duplicate of [Why does integer overflow on x86 with GCC cause an infinite loop?](http://stackoverflow.com/questions/7682477/why-does-integer-overflow-on-x86-with-gcc-cause-an-infinite-loop) – Abyx Jun 19 '14 at 11:54
  • 1
    @Abyx although they definitely seem closely related it is very hard to tell if this is the same problem. The gcc document I link to in my answer seems to indicate the optimization being triggered is rather new. While the question you link to covers a very old version of gcc. So it seems like same symptoms caused by different code paths in the gcc compiler. – Shafik Yaghmour Jun 19 '14 at 12:56
  • 1
    @BryanChen I don't believe any analysis of the generated assembly will allow you deduce what is going on here since this affect is happening during optimization. You will see the side effects of course but not the intention that went into what eventually generated the assembly. There is a lot of assembly in the answers but I don't think it enlightens the situation much. – Shafik Yaghmour Jun 19 '14 at 15:26
  • I do not think it obvious the claimed dup is the same. The aggressive loop optimization feature as far as I can tell did not exist all the way back to `gcc 4.5`, or at least the `gcc` docs strongly imply this as a new feaure. I don't have a way of verifying this and unless someone can verify it I think it is dubious to claim them as identical. – Shafik Yaghmour Jun 20 '14 at 03:46
  • 1
    So the aggressive loop optimization was added [in gcc 4.8](https://access.redhat.com/site/documentation/en-US/Red_Hat_Developer_Toolset/2/html/User_Guide/sect-Changes_in_Version_2.0-GCC.html) so it is literally impossible that these are dups simply by the fact that the other thread is too old. – Shafik Yaghmour Jun 20 '14 at 04:44
  • @Shafik Yaghmour: if it will be closed we always can vote for opening again :-) – zerkms Jun 20 '14 at 04:46
  • Oh man... Your question contains an answer - undefined behavior. – ST3 Jun 20 '14 at 06:58
  • I did the rollback because of the warning that wasn't here before, which changes the question context (the original question didn't even mention UB) – milleniumbug Jun 20 '14 at 11:58
  • @milleniumbug I don't believe this is a valid edit. Editing the question to add context is encouraged and is good. There is no way to generate this behavior without the warning and therefore that is critical context. I will rollback to *6* and the OP can decided whether to re-add the other text. If you strongly disagree feel free to flag for moderator attention. – Shafik Yaghmour Jun 20 '14 at 14:15
  • 5
    Bookmarking this so I can link it next time someone retorts "it's technically UB but it should do *thing*" :) – M.M Jun 20 '14 at 23:10
  • @Matt McNabb: there was such a comment before wasn't it? :-) – zerkms Jun 20 '14 at 23:17
  • @zerkms i thought so but either it disappeared or my memory is bad – M.M Jun 20 '14 at 23:20
  • 1
    I mentioned this question on this [blog entry](http://blogs.msdn.com/b/oldnewthing/archive/2014/06/27/10537746.aspx?CommentPosted=true#commentmessage) looks like it is driving some traffic and votes here. – Shafik Yaghmour Jun 27 '14 at 21:13
  • "AN IMPORTANT note for people voting for closing:..." Just because there are good answers doesn't mean it shouldn't be closed. Closing this question won't remove the answers or the value that they and this question add to the community. – Cornstalks Jun 28 '14 at 00:32
  • @ShafikYaghmour: I don't agree that this wasn't an issue before v4.8. The reason for the nontermination in both of these questions is *exactly* the same: the optimizer exploiting the undefined behavior of signed integer overflow. If you [read this](http://www.airs.com/blog/archives/120), you'll notice that all the way back in 2008 these kinds of loops could result in nontermination. GCC 4.8 wasn't the first release to do this kind of optimization; earlier releases did it too. However, GCC 4.8 *got better at it* and became more aggressive. This *is* a duplicate, and should be closed. – Cornstalks Jun 28 '14 at 15:57
  • @ShafikYaghmour: No, it just means you need to change your answer to use `-fwrapv`. `-fno-aggressive-loop-optimizations` probably isn't what's wanted, as it still leaves signed integer overflow leading to undefined behavior. Also, `-fwrapv` was available back in that other question. Just because you have two different answers doesn't mean that both questions aren't *exactly the same*. Also, let's stay focused; the question is "why does this happen" and not "how do I fix it." – Cornstalks Jun 28 '14 at 16:28
  • @zerkms [here is another case](http://stackoverflow.com/q/32506643/1708801) were this optimization strikes. In this case due to out of bounds access. – Shafik Yaghmour Sep 14 '15 at 17:16

5 Answers5

116

Signed integer overflow (as strictly speaking, there is no such thing as "unsigned integer overflow") means undefined behaviour. And this means anything can happen, and discussing why does it happen under the rules of C++ doesn't make sense.

C++11 draft N3337: §5.4:1

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [ Note: most existing implementations of C++ ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point exceptions vary among machines, and is usually adjustable by a library function. —end note ]

Your code compiled with g++ -O3 emits warning (even without -Wall)

a.cpp: In function 'int main()':
a.cpp:11:18: warning: iteration 3u invokes undefined behavior [-Waggressive-loop-optimizations]
   std::cout << i*1000000000 << std::endl;
                  ^
a.cpp:9:2: note: containing loop
  for (int i = 0; i < 4; ++i)
  ^

The only way we can analyze what the program is doing, is by reading the generated assembly code.

Here is the full assembly listing:

    .file   "a.cpp"
    .section    .text$_ZNKSt5ctypeIcE8do_widenEc,"x"
    .linkonce discard
    .align 2
LCOLDB0:
LHOTB0:
    .align 2
    .p2align 4,,15
    .globl  __ZNKSt5ctypeIcE8do_widenEc
    .def    __ZNKSt5ctypeIcE8do_widenEc;    .scl    2;  .type   32; .endef
__ZNKSt5ctypeIcE8do_widenEc:
LFB860:
    .cfi_startproc
    movzbl  4(%esp), %eax
    ret $4
    .cfi_endproc
LFE860:
LCOLDE0:
LHOTE0:
    .section    .text.unlikely,"x"
LCOLDB1:
    .text
LHOTB1:
    .p2align 4,,15
    .def    ___tcf_0;   .scl    3;  .type   32; .endef
___tcf_0:
LFB1091:
    .cfi_startproc
    movl    $__ZStL8__ioinit, %ecx
    jmp __ZNSt8ios_base4InitD1Ev
    .cfi_endproc
LFE1091:
    .section    .text.unlikely,"x"
LCOLDE1:
    .text
LHOTE1:
    .def    ___main;    .scl    2;  .type   32; .endef
    .section    .text.unlikely,"x"
LCOLDB2:
    .section    .text.startup,"x"
LHOTB2:
    .p2align 4,,15
    .globl  _main
    .def    _main;  .scl    2;  .type   32; .endef
_main:
LFB1084:
    .cfi_startproc
    leal    4(%esp), %ecx
    .cfi_def_cfa 1, 0
    andl    $-16, %esp
    pushl   -4(%ecx)
    pushl   %ebp
    .cfi_escape 0x10,0x5,0x2,0x75,0
    movl    %esp, %ebp
    pushl   %edi
    pushl   %esi
    pushl   %ebx
    pushl   %ecx
    .cfi_escape 0xf,0x3,0x75,0x70,0x6
    .cfi_escape 0x10,0x7,0x2,0x75,0x7c
    .cfi_escape 0x10,0x6,0x2,0x75,0x78
    .cfi_escape 0x10,0x3,0x2,0x75,0x74
    xorl    %edi, %edi
    subl    $24, %esp
    call    ___main
L4:
    movl    %edi, (%esp)
    movl    $__ZSt4cout, %ecx
    call    __ZNSolsEi
    movl    %eax, %esi
    movl    (%eax), %eax
    subl    $4, %esp
    movl    -12(%eax), %eax
    movl    124(%esi,%eax), %ebx
    testl   %ebx, %ebx
    je  L15
    cmpb    $0, 28(%ebx)
    je  L5
    movsbl  39(%ebx), %eax
L6:
    movl    %esi, %ecx
    movl    %eax, (%esp)
    addl    $1000000000, %edi
    call    __ZNSo3putEc
    subl    $4, %esp
    movl    %eax, %ecx
    call    __ZNSo5flushEv
    jmp L4
    .p2align 4,,10
L5:
    movl    %ebx, %ecx
    call    __ZNKSt5ctypeIcE13_M_widen_initEv
    movl    (%ebx), %eax
    movl    24(%eax), %edx
    movl    $10, %eax
    cmpl    $__ZNKSt5ctypeIcE8do_widenEc, %edx
    je  L6
    movl    $10, (%esp)
    movl    %ebx, %ecx
    call    *%edx
    movsbl  %al, %eax
    pushl   %edx
    jmp L6
L15:
    call    __ZSt16__throw_bad_castv
    .cfi_endproc
LFE1084:
    .section    .text.unlikely,"x"
LCOLDE2:
    .section    .text.startup,"x"
LHOTE2:
    .section    .text.unlikely,"x"
LCOLDB3:
    .section    .text.startup,"x"
LHOTB3:
    .p2align 4,,15
    .def    __GLOBAL__sub_I_main;   .scl    3;  .type   32; .endef
__GLOBAL__sub_I_main:
LFB1092:
    .cfi_startproc
    subl    $28, %esp
    .cfi_def_cfa_offset 32
    movl    $__ZStL8__ioinit, %ecx
    call    __ZNSt8ios_base4InitC1Ev
    movl    $___tcf_0, (%esp)
    call    _atexit
    addl    $28, %esp
    .cfi_def_cfa_offset 4
    ret
    .cfi_endproc
LFE1092:
    .section    .text.unlikely,"x"
LCOLDE3:
    .section    .text.startup,"x"
LHOTE3:
    .section    .ctors,"w"
    .align 4
    .long   __GLOBAL__sub_I_main
.lcomm __ZStL8__ioinit,1,1
    .ident  "GCC: (i686-posix-dwarf-rev1, Built by MinGW-W64 project) 4.9.0"
    .def    __ZNSt8ios_base4InitD1Ev;   .scl    2;  .type   32; .endef
    .def    __ZNSolsEi; .scl    2;  .type   32; .endef
    .def    __ZNSo3putEc;   .scl    2;  .type   32; .endef
    .def    __ZNSo5flushEv; .scl    2;  .type   32; .endef
    .def    __ZNKSt5ctypeIcE13_M_widen_initEv;  .scl    2;  .type   32; .endef
    .def    __ZSt16__throw_bad_castv;   .scl    2;  .type   32; .endef
    .def    __ZNSt8ios_base4InitC1Ev;   .scl    2;  .type   32; .endef
    .def    _atexit;    .scl    2;  .type   32; .endef

I can barely even read assembly, but even I can see the addl $1000000000, %edi line. The resulting code looks more like

for(int i = 0; /* nothing, that is - infinite loop */; i += 1000000000)
    std::cout << i << std::endl;

This comment of @T.C.:

I suspect that it's something like: (1) because every iteration with i of any value larger than 2 has undefined behavior -> (2) we can assume that i <= 2 for optimization purposes -> (3) the loop condition is always true -> (4) it's optimized away into an infinite loop.

gave me idea to compare the assembly code of the OP's code to the assembly code of the following code, with no undefined behaviour.

#include <iostream>

int main()
{
    // changed the termination condition
    for (int i = 0; i < 3; ++i)
        std::cout << i*1000000000 << std::endl;
}

And, in fact, the correct code has termination condition.

    ; ...snip...
L6:
    mov ecx, edi
    mov DWORD PTR [esp], eax
    add esi, 1000000000
    call    __ZNSo3putEc
    sub esp, 4
    mov ecx, eax
    call    __ZNSo5flushEv
    cmp esi, -1294967296 // here it is
    jne L7
    lea esp, [ebp-16]
    xor eax, eax
    pop ecx
    ; ...snip...

Unfortunately this is the consequences of writing buggy code.

Fortunately you can make use of better diagnostics and better debugging tools - that's what they are for:

  • enable all warnings

  • -Wall is the gcc option that enables all useful warnings with no false positives. This is a bare minimum that you should always use.

  • gcc has many other warning options, however, they are not enabled with -Wall as they may warn on false positives

  • Visual C++ unfortunately is lagging behind with the ability to give useful warnings. At least the IDE enables some by default.

  • use debug flags for debugging

    • for integer overflow -ftrapv traps the program on overflow,
    • Clang compiler is excellent for this: -fcatch-undefined-behavior catches a lot of instances of undefined behaviour (note: "a lot of" != "all of them")

I have a spaghetti mess of a program not written by me that needs to be shipped tomorrow! HELP!!!!!!111oneone

Use gcc's -fwrapv

This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation.

1 - this rule does not apply to "unsigned integer overflow", as §3.9.1.4 says that

Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.

and e.g. result of UINT_MAX + 1 is mathematically defined - by the rules of arithmetic modulo 2n

Flexo
  • 87,323
  • 22
  • 191
  • 272
milleniumbug
  • 15,379
  • 3
  • 47
  • 71
  • 7
    I still don't really understand what is happening here... Why is `i` itself affected? In general undefined behaviour is not having these kind of strange side effects, after all, `i*100000000` should be a rvalue – vsoftco Jun 18 '14 at 23:33
  • 3
    @vsoftco When the program executes something that has Undefined Behaviour, the C++ Standard does not pose *any* requirements on the *whole* program execution. This allows aggressive optimizations. – dyp Jun 18 '14 at 23:38
  • @dyp, thanks, I know that undefined behaviour can in principle perform a `rm -rf /`, however I don't get what kind of aggressive optimization is being performed here. Probably some asm code should clarify... – vsoftco Jun 18 '14 at 23:40
  • 28
    I suspect that it's something like: (1) because every iteration with `i` of any value larger than 2 has undefined behavior -> (2) we can assume that `i <= 2` for optimization purposes -> (3) the loop condition is always true -> (4) it's optimized away into an infinite loop. – T.C. Jun 18 '14 at 23:54
  • 30
    @vsoftco: What is going on is a case of [strength reduction](http://en.wikipedia.org/wiki/Strength_reduction), more specifically, [induction variable elimination](http://en.wikipedia.org/wiki/Induction_variable). The compiler eliminates the multiplication by emitting code that instead increments `i` by 1e9 each iteration (and changing the loop condition accordingly). This is a perfectly valid optimization under the "as if" rule as this program could not observe the difference were it well-behaving. Alas, it's not, and the optimization "leaks". – JohannesD Jun 19 '14 at 06:09
  • 8
    @JohannesD nailed the reason this breaks. However, this is a bad optimization since the loop termination condition doesn't involve overflow. The use of strength reduction was okay -- I don't know what the multiplier in the processor would do with (4*100000000) that would be different with (100000000+100000000+100000000+100000000), and falling back on "it's undefined -- who knows" is reasonable. But replacing what should be a well-behaved loop, which executes 4 times and produces undefined results, with something that executes more than 4 times "because it's undefined!" is idiocy. – Julie in Austin Jun 19 '14 at 13:19
  • 17
    @JulieinAustin While it may be idiotic to you, it's perfectly legal. On the positive side, the compiler warns you about it. – milleniumbug Jun 19 '14 at 13:33
  • 2
    Not everything that's legal is a good idea. GCC is becoming more and more dangerous to use as it takes strictly legal optimization opportunities which are in obvious opposition to the programmer's intent (and decades of compiler tradition). It's apparently more important to look good on benchmarks than to be a useful compiler. – Russell Borogove Jun 19 '14 at 17:54
  • 6
    @RussellBorogove How is it against the programmer intent? The programmer wants their code to run as fast as possible, be as clear as possible and do the right thing. The optimisation handles both the first and second case, and the third is impossible given the situation, so it warns you of that. I want my compiler to make my loop more computationally efficient, without having to write awkward looking source in order to optimise it. – deworde Jun 20 '14 at 09:12
  • 6
    @JulieinAustin You seem to be suggesting that the next C++ standard split `undefined behaviour` into two categories: acceptable unspecified behaviour and unacceptable unspecified behaviour, and furthermore stating that emitting a warning isn't enough; instead compilers should limit their optimizational ability/pollute their codebase to give you a different sort of warning or halt compilation in the unacceptable case. Seems a little unreasonable and unlikely to pass a vote. Would it perhaps be better if you set you compiler to fail on certain warnings instead? –  Jun 20 '14 at 12:19
  • Last assembly output has `jne L7` but `L6` in the label. Is this a typo? – arielf Jun 25 '14 at 00:32
  • 1
    @arielf This asembly output is only a fragment for demonstration. Full assembly listing has a L7 label elsewhere, but I cut it for clarity. Not sure if I should paste the entire thing, or maybe just leave "...cut for clarity" information. – milleniumbug Jun 25 '14 at 00:53
  • 3
    To the downvoter: Is the answer "not useful"? Does it have factual errors? Please tell me how it's wrong. Any criticism can help me improve my answer. – milleniumbug Jun 25 '14 at 21:24
  • 4
    @JulieinAustin: This optimization can speed up the code in many cases. If you want the compiler to not destroy the code (by assuming that signed integer overflow won't happen) which triggers undefined behavior, you can use `-fwrapv` in gcc. However, be aware that it kills many optimizations depending on signed integer overflow being undefined, and it's not proper C. If you disagree with this being undefined, don't use C or C++. – Konrad Borowski Jun 28 '14 at 18:16
  • 6
    Just a hint for the future: if you want to be able to make sense of the assembler output of a compiler *always* use the `-Os` option. Unoptimized code is obfuscated with unnecessary stuff, code optimized for speed is obfuscated with repeated code. Both hide the structure from the human eye. Code optimized for space gives the essence of what the compiler understood. – cmaster - reinstate monica Jul 24 '14 at 16:48
  • @cmaster Thanks for the response. Your hint definitely makes sense (that is, I can keep the unoptimized, optimized for space and optimized for speed together and only compare the differences), so I'll keep it in mind. However, the code with `-Os` does not loop infinitely (at least, not on gcc 4.9), as the OP says it does, and the analysis of this was the point, so the `-O3` switch is justified in this case. – milleniumbug Jul 24 '14 at 17:20
  • @cmaster-reinstatemonica Why don't stackoverflow have a "pin" function!? THANKS for that comment! (the `-Os`) – William Martens Apr 18 '21 at 18:01
  • @deworde: What is needed is for the Standard to recognize situations where the behavior of some action would be defined by transitively application of parts of the Standard along with documentation for the implementation and execution environment, but where it may sometimes be useful for implementations to deviate from that in cases that would facilitate optimization *without interfering with what programmers need to do*. The Standard waives jurisdiction in such cases because compiler writers are expected to know more about their customer needs than the Committee could ever hope to. – supercat Sep 21 '22 at 20:38
  • @deworde: Unfortunately, some compiler writers interpret the Committee's waiver of jurisdiction as implying a judgment that nothing a compiler might choose to do would be expected to adversely affect any correct program, ignoring the fact that the category "non-portable or erroneous program constructs" includes program constructs which, while non-portable, would also be 100% correct when processed by implementations *that are designed to be suitable for the task at hand*. – supercat Sep 21 '22 at 20:40
  • @supercat If I've understood correctly, my only quibble is with "What is needed" – deworde Sep 26 '22 at 08:27
  • @deworde: I don't think the so-called "Standard" can actually be useful as a means of judging conformance without recognizing such situations. – supercat Sep 26 '22 at 14:51
  • @deworde: To my mind, a standard exercises normative authority in cases where failure of an otherwise-conforming X to do Y would render it non-conforming. If the Standard were to recognize situations such as described, along with categories of deviations from the canonical behavior, then it could in turn recognize a category of "selectively conforming programs" and "safely conforming implementations" such that any SCI that received a correct SCP would be required to either process it correctly or indicate *via defined means* a refusal to do so. If an SCI could process a program in a way... – supercat Sep 26 '22 at 17:00
  • ...that would not satisfy certain behavioral requirements, that would imply that the program in question is not an SCP satisfying those requirements. Conversely, if an implementation given an SCP that meets certain requirements were to generate code which neither meets those requirements nor indicates via defined means an inability or refusal to do so, that would imply that the implementation is not an SCI. Compare that to the present "Standard": in what cases could anything an otherwise-conforming implementation might do in response to a particular program render it non-conforming? – supercat Sep 26 '22 at 17:06
72

Short answer, gcc specifically has documented this problem, we can see that in the gcc 4.8 release notes which says (emphasis mine going forward):

GCC now uses a more aggressive analysis to derive an upper bound for the number of iterations of loops using constraints imposed by language standards. This may cause non-conforming programs to no longer work as expected, such as SPEC CPU 2006 464.h264ref and 416.gamess. A new option, -fno-aggressive-loop-optimizations, was added to disable this aggressive analysis. In some loops that have known constant number of iterations, but undefined behavior is known to occur in the loop before reaching or during the last iteration, GCC will warn about the undefined behavior in the loop instead of deriving lower upper bound of the number of iterations for the loop. The warning can be disabled with -Wno-aggressive-loop-optimizations.

and indeed if we use -fno-aggressive-loop-optimizations the infinite loop behavior should cease and it does in all the cases I have tested.

The long answer starts with knowing that signed integer overflow is undefined behavior by looking at the draft C++ standard section 5 Expressions paragraph 4 which says:

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [ Note: most existing implementations of C++ ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point exceptions vary among machines, and is usually adjustable by a library function. —end note

We know that the standard says undefined behavior is unpredictable from the note that come with the definition which says:

[ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. —end note ]

But what in the world can the gcc optimizer be doing to turn this into an infinite loop? It sounds completely wacky. But thankfully gcc gives us a clue to figuring it out in the warning:

warning: iteration 3u invokes undefined behavior [-Waggressive-loop-optimizations]
   std::cout << i*1000000000 << std::endl;
                  ^

The clue is the Waggressive-loop-optimizations, what does that mean? Fortunately for us this is not the first time this optimization has broken code in this way and we are lucky because John Regehr has documented a case in the article GCC pre-4.8 Breaks Broken SPEC 2006 Benchmarks which shows the following code:

int d[16];

int SATD (void)
{
  int satd = 0, dd, k;
  for (dd=d[k=0]; k<16; dd=d[++k]) {
    satd += (dd < 0 ? -dd : dd);
  }
  return satd;
}

the article says:

The undefined behavior is accessing d[16] just before exiting the loop. In C99 it is legal to create a pointer to an element one position past the end of the array, but that pointer must not be dereferenced.

and later on says:

In detail, here is what’s going on. A C compiler, upon seeing d[++k], is permitted to assume that the incremented value of k is within the array bounds, since otherwise undefined behavior occurs. For the code here, GCC can infer that k is in the range 0..15. A bit later, when GCC sees k<16, it says to itself: “Aha– that expression is always true, so we have an infinite loop.” The situation here, where the compiler uses the assumption of well-definedness to infer a useful dataflow fact,

So what the compiler must be doing in some cases is assuming since signed integer overflow is undefined behavior then i must always be less than 4 and thus we have an infinite loop.

He explains this is very similar to the infamous Linux kernel null pointer check removal where in seeing this code:

struct foo *s = ...;
int x = s->f;
if (!s) return ERROR;

gcc inferred that since s was deferenced in s->f; and since dereferencing a null pointer is undefined behavior then s must not be null and therefore optimizes away the if (!s) check on the next line.

The lesson here is that modern optimizers are very aggressive about exploiting undefined behavior and most likely will only get more aggressive. Clearly with just a few examples we can see the optimizer does things that seem completely unreasonable to a programmer but in retrospect from the optimizers perspective make sense.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • 7
    I understand that this is what the compiler writer is doing (I used to write compilers and even an optimizer or two), but there are behaviors that are "useful" even though they are "undefined", and this march towards ever more aggressive optimization is just insanity. The construct you quote above is wrong, but optimizing out the error check is user-hostile. – Julie in Austin Jun 19 '14 at 13:34
  • 1
    @JulieinAustin I agree this is pretty surprising behavior, saying that developers need to avoid undefined behavior is really only half the issue. Clearly the compiler also needs to provide better feedback to the developer as well. In this case a warning is produced although it is not really informative enough. – Shafik Yaghmour Jun 19 '14 at 13:51
  • 4
    I think its a good thing, I want better, faster code. UB should never be used. – paulm Jun 19 '14 at 18:06
  • 1
    @paulm morally UB is clearly bad but it is hard to argue with providing better [better tools](http://stackoverflow.com/a/22700059/1708801) such as [clang static analyzer](http://clang-analyzer.llvm.org/) to help developers catch UB and other problems before it impacts production applications. – Shafik Yaghmour Jun 19 '14 at 19:23
  • @ShafikYaghmour Sure, but that should be an opt-in, as it increases compile time, which is another thing that you want to keep low. – deworde Jun 20 '14 at 09:07
  • 1
    @ShafikYaghmour Also, if your developer is ignoring warnings, what are the chances they'll pay any attention to clang output? This issue can easily be caught by an aggressive "no unjustifed warnings" policy. Clang advisable but not required. – deworde Jun 20 '14 at 09:10
  • @deworde yes, compile time is definitely important, both tools I pointed out are optional. I also agree with a treat errors like warnings policy. – Shafik Yaghmour Jun 20 '14 at 09:30
  • @deworde: Flipping things around, what would be the problem with having an utility identify places where a program performance could be improved by adding __ASSUME* directives in cases where a compiler's assumptions would coincide with programmer intent, and then having an optimizer which would only try to optimize things thus marked? I would think that such a thing could greatly improve optimizer performance and safety, since it wouldn't have to chase down code paths which don't actually lead to any useful optimizations. – supercat Apr 23 '15 at 21:54
  • @JulieinAustin it is worth noting that gcc catches this case with `-fsanitize=undefined` ... I realized this [answering a similar question](http://stackoverflow.com/q/32506643/1708801). – Shafik Yaghmour Sep 14 '15 at 17:20
  • @ShafikYaghmour - I love GCC. I do not love its behavior ;) – Julie in Austin Sep 15 '15 at 01:00
24

tl;dr The code generates a test that integer + positive integer == negative integer. Usually the optimizer does not optimize this out, but in the specific case of std::endl being used next, the compiler does optimize this test out. I haven't figured out what's special about endl yet.


From the assembly code at -O1 and higher levels, it is clear that gcc refactors the loop to:

i = 0;
do {
    cout << i << endl;
    i += NUMBER;
} 
while (i != NUMBER * 4)

The biggest value that works correctly is 715827882, i.e. floor(INT_MAX/3). The assembly snippet at -O1 is:

L4:
movsbl  %al, %eax
movl    %eax, 4(%esp)
movl    $__ZSt4cout, (%esp)
call    __ZNSo3putEc
movl    %eax, (%esp)
call    __ZNSo5flushEv
addl    $715827882, %esi
cmpl    $-1431655768, %esi
jne L6
    // fallthrough to "return" code

Note, the -1431655768 is 4 * 715827882 in 2's complement.

Hitting -O2 optimizes that to the following:

L4:
movsbl  %al, %eax
addl    $715827882, %esi
movl    %eax, 4(%esp)
movl    $__ZSt4cout, (%esp)
call    __ZNSo3putEc
movl    %eax, (%esp)
call    __ZNSo5flushEv
cmpl    $-1431655768, %esi
jne L6
leal    -8(%ebp), %esp
jne L6 
   // fallthrough to "return" code

So the optimization that has been made is merely that the addl was moved higher up.

If we recompile with 715827883 instead then the -O1 version is identical apart from the changed number and test value. However, -O2 then makes a change:

L4:
movsbl  %al, %eax
addl    $715827883, %esi
movl    %eax, 4(%esp)
movl    $__ZSt4cout, (%esp)
call    __ZNSo3putEc
movl    %eax, (%esp)
call    __ZNSo5flushEv
jmp L2

Where there was cmpl $-1431655764, %esi at -O1, that line has been removed for -O2. The optimizer must have decided that adding 715827883 to %esi can never equal -1431655764.

This is pretty puzzling. Adding that to INT_MIN+1 does generate the expected result, so the optimizer must have decided that %esi can never be INT_MIN+1 and I'm not sure why it would decide that.

In the working example it seems it'd be equally valid to conclude that adding 715827882 to a number cannot equal INT_MIN + 715827882 - 2 ! (this is only possible if wraparound does actually occur), yet it does not optimize the line out in that example.


The code I was using is:

#include <iostream>
#include <cstdio>

int main()
{
    for (int i = 0; i < 4; ++i)
    {
        //volatile int j = i*715827883;
        volatile int j = i*715827882;
        printf("%d\n", j);

        std::endl(std::cout);
    }
}

If the std::endl(std::cout) is removed then the optimization no longer occurs. In fact replacing it with std::cout.put('\n'); std::flush(std::cout); also causes the optimization to not happen, even though std::endl is inlined.

The inlining of std::endl seems to affect the earlier part of the loop structure (which I don't quite understand what it is doing but I'll post it here in case someone else does):

With original code and -O2:

L2:
movl    %esi, 28(%esp)
movl    28(%esp), %eax
movl    $LC0, (%esp)
movl    %eax, 4(%esp)
call    _printf
movl    __ZSt4cout, %eax
movl    -12(%eax), %eax
movl    __ZSt4cout+124(%eax), %ebx
testl   %ebx, %ebx
je  L10
cmpb    $0, 28(%ebx)
je  L3
movzbl  39(%ebx), %eax
L4:
movsbl  %al, %eax
addl    $715827883, %esi
movl    %eax, 4(%esp)
movl    $__ZSt4cout, (%esp)
call    __ZNSo3putEc
movl    %eax, (%esp)
call    __ZNSo5flushEv
jmp L2                  // no test

With mymanual inlining of std::endl, -O2:

L3:
movl    %ebx, 28(%esp)
movl    28(%esp), %eax
addl    $715827883, %ebx
movl    $LC0, (%esp)
movl    %eax, 4(%esp)
call    _printf
movl    $10, 4(%esp)
movl    $__ZSt4cout, (%esp)
call    __ZNSo3putEc
movl    $__ZSt4cout, (%esp)
call    __ZNSo5flushEv
cmpl    $-1431655764, %ebx
jne L3
xorl    %eax, %eax

One difference between these two is that %esi is used in the original , and %ebx in the second version; is there any difference in semantics defined between %esi and %ebx in general? (I don't know much about x86 assembly).

M.M
  • 138,810
  • 21
  • 208
  • 365
  • It'd be good to find out exactly what the optimizer's logic was though, it's not clear to me at this stage why some cases have the test optimized out and some don't – M.M Jun 21 '14 at 02:07
9

Another example of this error being reported in gcc is when you have a loop that executes for a constant number of iterations, but you are using the counter variable as an index into an array that has less than that number of items, such as:

int a[50], x;

for( i=0; i < 1000; i++) x = a[i];

The compiler can determine that this loop will try to access memory outside of the array 'a'. The compiler complains about this with this rather cryptic message:

iteration xxu invokes undefined behavior [-Werror=aggressive-loop-optimizations]

Ryan Bemrose
  • 9,018
  • 1
  • 41
  • 54
Ed Tyler
  • 91
  • 1
  • 1
  • Even more cryptic is that the message emits only when optimization is turned on. M$ VB message "Array out of bound" is for dummies? – Ravi Ganesh Mar 06 '19 at 05:01
6

What I cannot get is why i value is broken by that overflow operation?

It seems that integer overflow occurs in 4th iteration (for i = 3). signed integer overflow invokes undefined behavior. In this case nothing can be predicted. The loop may iterate only 4 times or it may go to infinite or anything else!
Result may vary compiler to compiler or even for different versions of same compiler.

C11: 1.3.24 undefined behavior:

behavior for which this International Standard imposes no requirements
[ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. —end note ]

Community
  • 1
  • 1
haccks
  • 104,019
  • 25
  • 176
  • 264
  • @bits_international; Yes. – haccks Jun 18 '14 at 23:37
  • 4
    You're right, it's fair to explain why I downvoted. The information in this answer is correct, but it is not educational and it completely ignores the elephant in the room: the breakage apparently happens *at a different place* (stopping condition) than the operation causing the overflow. The mechanics of how things are broken in this specific case are not explained, even though this is the core of this question. It's a typical bad teacher situation where the teacher's answer not only doesn't address the core of the problem, it discourages further questions. It almost sounds like ... – Szabolcs Jun 19 '14 at 22:43
  • 5
    "I see this is undefined behaviour, and from this point on I don't care how or why it breaks. The standard allows it to break. No further questions." You might not have meant it like that, but it seems like that. I am hoping to see less of this (unfortunatly common) attitude on SO. This is not practically useful. If you get user input, it is not reasonable to check for overflow *after every single signed integer operation*, even if the standard says that *any other* part of the program may blow up because of it. Understanding *how* it breaks *does* help avoid problems like this in practice. – Szabolcs Jun 19 '14 at 22:44
  • 3
    @Szabolcs: It may be best to think of C as two languages, one of which was designed to allow simple compilers to achieve reasonably-efficient executable code with the aid of programmers who exploit constructs which would be reliable on their intended target platforms but not others, and were consequently ignored by the Standards committee, and a second language which excludes all such constructs for which the Standard does not mandate support, for the purpose of allowing compilers to apply additional optimizations which may or may not outweigh the ones programmers have to give up. – supercat Aug 16 '16 at 19:32
  • @Szabolcs: Since embedded or systems programming is in general impossible without exploiting concepts that don't work identically on all systems, and are thus ignored by the Standard, the latter language is totally useless for systems programming. Unfortunately, the authors of gcc seem to show little interest in making optimizations compatible with a version of the language which is suitable for embedded or systems programming. – supercat Aug 16 '16 at 19:39
  • 1
    @Szabolcs "*If you get user input, it is not reasonable to check for overflow after every single signed integer operation*" - correct because at that point it's too late. You have to check for overflow *before* every single signed integer operation. – melpomene Sep 10 '18 at 10:21