9

I was learning about undefined behaviour and stumbled upon this code without any clear explanation:

#include <stdio.h>
#include <limits.h>

int foo ( int x) {
    printf ("% d\n" ,  x );   //2147483647
    printf ("% d\n" ,  x+1 ); //-2147483648  overflow
    return ( x+1 ) > x ;      // 1 but How????
}

int main ( void ) {
    printf ("% d\n" ,  INT_MAX );     //2147483647
    printf ("% d\n" ,  INT_MAX+1 );   //-2147483648  overflow
    printf ("% d\n" , ( INT_MAX+1 ) > INT_MAX );  //0  makes sense, since -ve < +ve
    printf ("% d\n" ,  foo(INT_MAX) );  //1
    return 0;
}

When compiling on gcc, the compiler issues a warning:

warning: integer overflow in expression of type 'int' results in '-2147483648'

So, clearly the value of INT_MAX+1 is negative, which explains why (INT_MAX+1) > INT_MAX evaluates to 0.

But, why (or how) is (x+1) > x evaluating to 1 for x = INT_MAX in foo(...)?

dbush
  • 205,898
  • 23
  • 218
  • 273
avm
  • 387
  • 3
  • 16
  • Where is `MAX_INT` defined, and what is it defined as? – Scott Hunter May 28 '21 at 18:07
  • @ScottHunter inside limits.h header. I updated the code with same – avm May 28 '21 at 18:09
  • No, it isn't. `INT_MAX`, which you use repeatedly, is. – Scott Hunter May 28 '21 at 18:10
  • 2
    The fact that this is undefined behavior is what lets the compiler replace `(x + 1) > x` with `true` when optimization is turned on (and if `x` is a signed integral type or an integral type with lower rank than `int`) – Ben Voigt May 28 '21 at 18:11
  • https://stackoverflow.com/questions/16188263/is-signed-integer-overflow-still-undefined-behavior-in-c – Cory Kramer May 28 '21 at 18:11
  • 1
    Because undefined behavior. – Pete Becker May 28 '21 at 18:11
  • [Related](https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior), though I wouldn't call it a duplicate. It does have some useful background on UB though – Silvio Mayolo May 28 '21 at 18:12
  • Asking for explanations about what happens with undefined behavior is like asking "how long is a rope". There is no meaningful answer. – Pete Becker May 28 '21 at 18:12
  • Very close to a duplicate of https://stackoverflow.com/q/6856227/103167 – Ben Voigt May 28 '21 at 18:13
  • Off-site related: https://codereview.stackexchange.com/a/37178/2150 – Ben Voigt May 28 '21 at 18:15
  • another related: https://stackoverflow.com/q/2633661/103167 – Ben Voigt May 28 '21 at 18:19
  • Note that with compiler warnings enabled for **clang**, two of the problems are noticed by the compiler and a `warning` is output during compilation, and with *sanitize* enabled, the other two integer overflows are detected at runtime (plus the same two that were detected at compile time). That being said, compilers are not burdened with detecting all undefined behavior situations. – Eljay May 28 '21 at 19:07

2 Answers2

13

When a program exhibits undefined behavior, the C standard makes no predictions regarding what the program will do. The program may crash, it may output strange results, or it may appear to work properly.

In fact, compilers will often work under the assumption that a program does not contain undefined behavior.

In the case of this expression:

( x+1 ) > x 

Given that x has type int, the compiler knows that signed overflow is UB and works under the assumption that it will not occur. With that in mind, there is no value for x where this expression could be false, so the compiler could optimize away the expression and replace it with the value 1.

When I run this program under gcc 4.8.5, I get the following results with -O0 and -O1:

 2147483647
-2147483648
 0
 2147483647
-2147483648
 0

And the following with -O2 and -O3:

 2147483647
-2147483648
 0
 2147483647
-2147483648
 1

Then looking at the assembly for foo in the later case:

foo:
.LFB11:
    .file 1 "x1.c"
    .loc 1 4 0
    .cfi_startproc
.LVL0:
    pushq   %rbx                // first call to printf
    .cfi_def_cfa_offset 16
    .cfi_offset 3, -16
    .loc 1 5 0
    movl    %edi, %esi
    .loc 1 4 0
    movl    %edi, %ebx
    .loc 1 5 0
    xorl    %eax, %eax
    movl    $.LC0, %edi
.LVL1:
    call    printf
.LVL2:
    .loc 1 6 0                  // second call to printf
    leal    1(%rbx), %esi
    movl    $.LC0, %edi
    xorl    %eax, %eax
    call    printf
.LVL3:
    .loc 1 8 0                  // return value
    movl    $1, %eax
    popq    %rbx
    .cfi_def_cfa_offset 8
.LVL4:
    ret
    .cfi_endproc

We can see that's exactly what the compiler did: it optimized away the comparison and always returns 1.

This illustrates how compilers can make use of undefined behavior to apply various optimizations.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • So if a compiler can detect undefined behavior then why not throw an exception? This way UB can be avoided to produce unexpected results which we can't control? – avm May 28 '21 at 18:18
  • Why is there no UB associated with `INT_MAX+1 > INT_MAX`? – avm May 28 '21 at 18:20
  • 3
    @avm: Other languages can and do do that. It slows down the code to always be checking for overflow. C does not do extra work that you didn't ask for. – Ben Voigt May 28 '21 at 18:20
  • 5
    @avm The overhead of noticing undefined behaviour would be a massive loss. By design, C++ assumes you know what you're doing. – Vincent Fourmond May 28 '21 at 18:20
  • 3
    @avm It doesn't detect UB. It just assumes it won't happen. That's part of what makes C fast. It's up to the programmer to not introduce UB. And `INT_MAX+1 > INT_MAX` *is* UB. The fact that you get the result you expect doesn't change that. – dbush May 28 '21 at 18:20
  • @VincentFourmond But if we know for sure that a code can produce UB then shouldn't we handle it by maybe throwing an exception. Why leave a door to hell open when we can keep it locked? For example [front](https://en.cppreference.com/w/cpp/container/array/front) documents UB when container is empty which can be easily handled by throwing out_of_range exception. – avm May 28 '21 at 18:35
  • 4
    @avm • **undefined behavior** is not a compiler problem, it's a programmer problem. C and C++ are not nanny languages. They presume the programmer knows what they are doing, and never lie to the compiler, and never have *undefined behavior* in their code. In the real world, this translates into an abundance of bugs *job security*. – Eljay May 28 '21 at 18:38
  • 1
    @avm Compilers are not required to diagnose undefined behavior. They may in certain cases, or then can create extensions that remove certain instances of UB, but such cases are typically non-portable. – dbush May 28 '21 at 18:40
  • That said, in cases where the error is detectable at compile time you'll often get a polite warning message saying you should take a closer look at a piece of code and maybe a hint suggesting what you could do instead. But not always. Example: https://godbolt.org/z/Yq4cqeoPa – user4581301 May 28 '21 at 18:55
  • A warning is a kind of exception -- it's just one you can easily ignore (by not using `-Werror` on the compile). The warning message even tells you exactly what will happen if you do ignore it (it will wrap the addition to the large negative value). – Chris Dodd May 28 '21 at 19:01
  • @avm you can define the behavior with [`-fwrapv`](https://stackoverflow.com/q/47232954/995714) or [`-ftrapv`](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html). The output will be 0 as expected when wrapping around with `-fwrapv`: https://godbolt.org/z/56WecsePd. You can also detect overflow at run time with [`-fsanitize=signed-integer-overflow`](https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html): https://godbolt.org/z/1PGb4xEcx – phuclv May 29 '21 at 01:52
  • @Eljay: Why do did authors of the C Standard say that they intended that Undefined Behavior, among other things, "identify areas of 'conforming language extension'" if they expected that programs would never do anything whose behavior wasn't mandated by the Standard? – supercat May 29 '21 at 15:16
  • @supercat • A C or C++ implementation can provide an *implementation defined behavior* that in the C standard or C++ standard is undefined behavior. Such code is not portable. A *conforming language extension* means the implementor may augment the language by providing a definition of the officially undefined behavior. – Eljay May 29 '21 at 15:34
  • @Eljay: The Standard only uses the term "implementation-defined" to describe constructs which *all* implementations are required to document. For an implementation to document something it's required to document would hardly be a "conforming language extension". If integer overflow were IDB, that would mean that any potential traps from integer overflow would be characterized as "observable side effects", which would forbid what would otherwise be useful optimizations such as omitting calculations which might overflow but whose results would otherwise be ignored. – supercat May 29 '21 at 19:03
  • @Eljay: I'm not sure where the myth comes from that the Standard uses the term "Implementation-Defined Behavior" to describe actions which 90% of implementations were expected to process in at least somewhat consistent fashion, but where it might be impractical for some implementations to behave consistently. Such a myth, however, is supported neither by the Rationale, nor by the Standard's choice of when to use that phrase. Given that the C Standard was written to describe a language that was *already in use*, characterizing some formerly-defined constructs as UB could make the language... – supercat May 29 '21 at 19:13
  • ...implementable on some platforms where it would have been impractical, with the caveat that programs relying upon those constructs wouldn't work *on the newly-available platforms*. There is zero evidence to suggest that it was intended to imply that constructs shouldn't continue to be usable on the platforms that supported them. Had anyone realized the Standard would have been interpreted in that fashion, it would have been soundly rejected. – supercat May 29 '21 at 19:15
  • @Eljay: What do you think the authors of the Standard were referring to by the phrase "certain popular extensions" near the top of page 7 of http://www.open-std.org/jtc1/sc22/wg14/www/docs/n897.pdf if it wasn't intended to include the fact that many pre-standard dialects (including the language described by the 1974 C Reference Manual or K&R 1st Edition) defined the behavior of constructs which C89 reclassified as Undefined Behavior? – supercat May 29 '21 at 19:21
  • @supercat • I'm not sure what you are getting at; *implementation-defined* ≠ *conforming language extension*. – Eljay May 30 '21 at 11:06
  • @Eljay: When the Standard was being drafted, there were some actions which would be processed meaningfully by many implementations but not all. Arguments about whether such actions should behave meaningfully were usually resolved by having the Standard classify them as "Undefined Behavior", *but allowing implementations that would have defined their behaviors to continue doing so*. The only people were expected to care about whether the Standard defined such actions would be those working with platforms where it would be impractical to support the commonplace behavior. – supercat May 30 '21 at 16:15
  • @Eljay: In their published Rationale document, the authors of the Standard used the phrases "conforming language extension" and "popular extension" to refer to actions which the Standard classified as "Undefined Behavior", but which implementations usefully defined anyway. Note that the phrase "Implementation-Defined Behavior" as used in the Standard refers exclusively to behaviors which implementations were *required* by the Standard to document. There should perhaps have been a term for behaviors which quality implementations should document *to the extent practical*, but... – supercat May 30 '21 at 16:20
  • ...C89 deliberately avoided giving any recommendations that weren't requirements. – supercat May 30 '21 at 16:22
1

When the Standard was written, compilers for conventional architectures would often perform integer arithmetic in wraparound two's-complement fashion, but there were times when doing something else might be more useful. As a couple of examples:

  1. If a program was known not to deliberately cause integer overflows, having an implementation trap on overflow would be less bad than having it output that was superficially valid but wrong.

  2. Even on commonplace platforms, it was sometimes advantageous to perform arithmetic as though using a wider than specified type. For example, on the 8086, the multiply instruction would take two 16-bit operands and produce a 32-bit result, so when performing a computation like int32a=int16a*int16b+int32b;, keeping the 32-bit result of the multiplication would be cheaper than using a sign-extension instruction to promote the bottom 16 bits of the result to 32 bits. Additionally, that abstraction model would allow many kinds of expressions to be simplified, such as replacing (x*30/15) with (x*2), or (as shown in the example), x+y > x with y > 0.

Rather than trying to guess at all the ways it might be useful for an implementation to handle integer overflow, or risk preventing implementations from treating integer overflow in whatever fashion their customers would find most useful, the Standard lets implementations choose whatever method they think most useful. The authors of gcc have decided that that the most useful way to process integer overflow is to use it to produce extended inferences that aren't bound by normal laws of causality.

Consider, for example:

unsigned arr[32771];
unsigned mul_mod_32768(unsigned short x, unsigned short y)
{
    /* Note that the authors of the Standard specified that the multiply
       here happens as signed, because--according to the Rationale--they
       expected that commonplace implementations would process signed and
       unsigned math identically in cases like this! */
    return (x * y) & 0x7FFFu;
}
void test(unsigned short n)
{
    unsigned total=0;        
    unsigned short s2=65535;
    for (unsigned short i=32768; i < n; i++)
    {
        total += mul_mod_32768(i, 65535);
    }
    if (n < 32770)
        arr[n] = total;
}

At optimization level 2 or 3, gcc will generate code for test() that is precisely equivalent to:

void test(unsigned short n)
{
    arr[n] = 0;
}

If n is 32768 or less, the loop won't run at all, total will be zero, and total will be stored into arr[n]. If n is 32769, the loop will run once, adding 0 to total, which will then be stored into arr[n]. If n is 32770 or greater, the Standard won't impose any requirements, so gcc will process those cases the same way as it processed the others, blindly storing zero into arr[n].

The Standard deliberately makes no attempt to forbid implementations which are specialized for particular narrow purposes from behaving in ways that would make them unsuitable for many others. The behavior of gcc here may be suitable for use with programs that will process data exclusively from trustworthy sources, but that doesn't imply that it should be viewed as suitable for anything else. Unfortunately, the language clang and gcc seek to process is very different from the language the C Standards Committee was chartered to describe.

supercat
  • 77,689
  • 9
  • 166
  • 211