1

I have found that this code causes a startling error in the gnu C++ compiler when it is optimizing.

#include <stdio.h>

int main()
{
    int a = 333666999, b = 0;
    for (short i = 0; i<7; ++i)
    {
        b += a; 
        printf("%d  ", b);
    }
    return 9;
}

To compile using g++ -Os fail.cpp the executable does not print seven numbers, it goes on forever, printing and printing. I am using -

-rwxr-xr-x 4 root root 700388 Jun  3  2013 /usr/bin/g++

Is there a later corrected version?

sepp2k
  • 363,768
  • 54
  • 674
  • 675
Clive
  • 269
  • 1
  • 14
  • Please show at least the truncated output. That should give an idea of what's going on. – cigien Nov 10 '20 at 15:30
  • 2
    This probably overflows `int`, which is UB, so the compiler is technically not violating the standard. – Brian Bi Nov 10 '20 at 15:32
  • 1
    variable `b` should be at least 8 bytes, change it to `long long b` – Harry Nov 10 '20 at 15:32
  • @Harry `a` is not the problem, it overflows in `b`. – mch Nov 10 '20 at 15:33
  • cigien - 333666999 667333998 1001000997 1334667996 1668334995 2002001994 -1959298303 -1625631304 -1291964305 -958297306 -624630307 -290963308 42703691 376370690 710037689 1043704688 1377371687 1711038686 2044705685 -1916594612 -1582927613 -1249260614 -915593615 -581926616 -248259617 85407382 419074381 – Clive Nov 10 '20 at 15:34
  • See my answer at https://stackoverflow.com/a/61444851/634919. The bug is in your code, not the compiler. If you want wraparound you can use `-fwrapv` – Nate Eldredge Nov 10 '20 at 15:34
  • Please add all information to the question, instead of a comment. – cigien Nov 10 '20 at 15:35
  • Overflowing by adding numerically 4 byte number to 4 byte number is NOT the problem. There is no error caused by that. The problem is in the compiler, not my program. When it is not asked to optimize it works fine. The compiler removes the short i and then mistakenly misplaces its offsets to the variables in the executable. This is definitely an error in the compiler. But it's old, and I wondered if there was a later version where this had been corrected? – Clive Nov 10 '20 at 15:42

2 Answers2

8

The compiler is very, very rarely wrong. In this case, b is overflowing, which is undefined behaviour for signed integers:

$ g++ --version
g++ (GCC) 10.2.0
...
$ g++ -Os -otest test.cpp
test.cpp: In function ‘int main()’:
test.cpp:8:11: warning: iteration 6 invokes undefined behavior [-Waggressive-loop-optimizations]
    8 |         b += a;
      |         ~~^~~~
test.cpp:6:24: note: within this loop
    6 |     for (short i = 0; i<7; ++i)
      |                       ~^~

And if you invoke undefined behaviour, the compiler is free to do whatever it likes, including making your program never terminate.


Edit: Some people seem to think that the UB should only affect the value of b, but not the loop iteration. This is not according to the Standard (UB can cause literally anything to happen) but it's a reasonable thought, so let's look at the generated assembly to see why the loop doesn't terminate.

First without -Os:

.LC0:
        .string "%d  "
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     DWORD PTR [rbp-12], 333666999
        mov     DWORD PTR [rbp-4], 0
        mov     WORD PTR [rbp-6], 0
.L3:
        cmp     WORD PTR [rbp-6], 6      # Compare i to 6
        jg      .L2                      # If greater, jump to end
        mov     eax, DWORD PTR [rbp-12]
        add     DWORD PTR [rbp-4], eax
        mov     eax, DWORD PTR [rbp-4]
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        movzx   eax, WORD PTR [rbp-6]
        add     eax, 1
        mov     WORD PTR [rbp-6], ax
        jmp     .L3
.L2:
        mov     eax, 9
        leave
        ret

Then with -Os:

.LC0:
        .string "%d  "
main:
        push    rbx
        xor     ebx, ebx
.L2:
        add     ebx, 333666999
        mov     edi, OFFSET FLAT:.LC0
        xor     eax, eax
        mov     esi, ebx
        call    printf
        jmp     .L2

The comparison and jump instructions are completely gone. Ironically, the compiler did exactly what you asked it to do: optimize for size, so remove as many instructions as it can while obeying the C++ standard. -O3 and -O2 generate the exact same code as -Os here.

-O1 generates a very interesting output:

.LC0:
        .string "%d  "
main:
        push    rbx
        mov     ebx, 0
.L2:
        add     ebx, 333666999
        mov     esi, ebx
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        cmp     ebx, -1959298303
        jne     .L2
        mov     eax, 9
        pop     rbx
        ret

Here, the compiler optimized away the loop counter i and just compares the value of b to its final value after 7 iterations, using the fact that signed overflow happens according to two's complement on this platform! Cheeky, isn't it? :)

Thomas
  • 174,939
  • 50
  • 355
  • 478
  • Thomas - My compiler is 4.8.1 and does not provide this warning. Thanks for the info. – Clive Nov 10 '20 at 15:52
  • This is certainly interesting information, but IMO *completely nuts*. The guys who wrote the standard need some spanking. – Sven Nilsson Nov 10 '20 at 15:54
  • Thomas - So if I make a and b unsigned and add them repeatedly (no matter how high) then it will be 'defined' behavior, and not cause warnings, and not cause errors in operation? – Clive Nov 10 '20 at 16:25
  • @Sven Nilsson. - I agree. If the standard says that an overflow of signed integers is 'undefined' behavior, but an overflow of unsigned is 'defined', that's nuts. Either way, putting out a warning of "undefined behavior" does not excuse the compiler from affecting the loop iterator (the 'i' in my program above). That should not have been damaged, whether it was optimized out or not. Since it's only a 'warning' not an 'error' the value it writes back to 'b' in memory should only involve the 4 bytes. ...continued – Clive Nov 10 '20 at 16:49
  • In other words, because of that warning we might not be allowed to assume 2's complement overflow rules. But we should certainly be allowed to expect the loop variable to be unaffected. – Clive Nov 10 '20 at 16:49
  • @Clive Yes, signed overflow is well-defined and your code will terminate as expected. C++ is full of [footguns](https://en.wiktionary.org/wiki/footgun) like this; that may be nuts but it's the way it is, and probably the way it will always remain. You could consider using a more modern language like Rust instead. – Thomas Nov 11 '20 at 08:54
  • I dug a bit deeper out of curiosity and made some edits. – Thomas Nov 11 '20 at 09:12
-1

I am using g++ version 4.8.1. Thomas has version 10.2.0 which evidently puts out a warning about "undefined behavior" when adding two signed integers. However, being only a warning still goes ahead and compiles the program. In all circumstances though, the "undefined behavior" should only be concerning the integers being added. In practice those integers in fact do abide by the 2's complement expected result. The "undefined behavior" should not overwrite other variables in the program. Otherwise the executable cannot be trusted at all. And if it cannot be trusted it shouldn't be compiled. Perhaps there is an even later version of the gnu compiler that works correctly when optimizing?

Clive
  • 269
  • 1
  • 14
  • 1
    That is wrong. If your program has undefined behaviour is no required to do anything meaningful, according to this: https://en.cppreference.com/w/cpp/language/ub – gerum Nov 11 '20 at 08:12