3

For example, given the following code:

int f(int n)
{
    if (n < 0)
        return 0;
    n = n + 100;
    if (n < 0)
        return 0;
    return n;
}

Assuming you pass in a number that is very close to integer overflow (less than 100 away), will the compiler produce code that would give you a negative return?

Here is an excerpt about this issue from "The Descent to C" by Simon Tatham:

"The GNU C compiler (gcc) generates code for this function which can return a negative integer, if you pass in (for example) the maximum represent able ‘int’ value. Because the compiler knows after the first if statement that n is positive, and then it assumes that integer overflow does not occur and uses that assumption to conclude that the value of n after the addition must still be positive, so it completely removes the second if statement and returns the result of the addition unchecked."

It got me wondering if the same issue existed in C++ compilers, and if I should be careful that my integer overflow checks aren't skipped.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
SaintAnti
  • 41
  • 3
  • 2
    When you have undefined behavior the compiler can make assumptions that may seem very odd but you look at this [example here](http://stackoverflow.com/questions/24296571/why-does-this-loop-produce-warning-iteration-3u-invokes-undefined-behavior-an/24297811#24297811) to see the compiler turning a finite loop into an infinite loop due to optimizations around undefined behavior. – Shafik Yaghmour Aug 06 '14 at 03:10
  • 2
    What's funny here is that the standard deliberately defines something that is perfectly well-defined on the hardware (on all _real_, non-fictional hardware, anyway) as undefined behavior only so compilers can do that kind of optimization. Although INT_MAX+1 really equals INT_MIN on any CPU that you're going to be able to find, _assuming that it doesn't_ and saying that this is undefined lets you legally optimize out code like the above or lets you consider consider `x+1>x` as "always true", or lets you assert that loop iterations are finite. – Damon Aug 06 '14 at 14:06
  • Maybe they felt that this question had been adequately answered previously and I did not search well enough to find it? – SaintAnti Aug 07 '14 at 18:25
  • @Damon: What is your basis for that assertion? In 1989, the behavior was consistent on *most* hardware, but not all, and the authors of the Standard noted in the rationale that a major motivating factor for making short unsigned values promote as signed was the fact that the majority of implementations had defined silent wraparound overflow semantics. – supercat May 09 '16 at 20:02

1 Answers1

8

Short Answer

Will a compiler definitely optimize away the check in your example, we can't say for all cases but we can do a test against gcc 4.9 using the godbolt interactive compiler with the following code (see it live):

int f(int n)
{
    if (n < 0) return 0;

    n = n + 100;

    if (n < 0) return 0;

    return n;
}

int f2(int n)
{
    if (n < 0) return 0;

    n = n + 100;

    return n;
}

and we see that it generates identical code for both versions, which means it is indeed eliding the second check:

f(int):  
    leal    100(%rdi), %eax #, tmp88 
    testl   %edi, %edi  # n
    movl    $0, %edx    #, tmp89
    cmovs   %edx, %eax  # tmp88,, tmp89, D.2246
    ret
f2(int):
    leal    100(%rdi), %eax #, tmp88
    testl   %edi, %edi  # n
    movl    $0, %edx    #, tmp89 
    cmovs   %edx, %eax  # tmp88,, tmp89, D.2249
    ret

Long Answer

When your code exhibits undefined behavior or relies on potential undefined behavior(in this example signed integer overflow) then yes, the compiler can make assumptions and optimize around them. For example it can assume there is no undefined behavior and thus optimize according to that assumption. The most infamous example is probably the removal of a null check in the Linux kernel. The code was as follows:

struct foo *s = ...;
int x = s->f;
if (!s) return ERROR;
... use s ..

The logic used was that since s was dereferenced it must not be a null pointer otherwise that would be undefined behavior and therefore it optimized away the if (!s) check. The linked article says:

The problem is that the dereference of s in line 2 permits a compiler to infer that s is not null (if the pointer is null then the function is undefined; the compiler can simply ignore this case). Thus, the null check in line 3 gets silently optimized away and now the kernel contains an exploitable bug if an attacker can find a way to invoke this code with a null pointer.

This applies equally to both C and C++ which both have similar language around undefined behavior. In both cases the standard tells us that the results of undefined behavior are unpredictable, although what is specifically undefined in either language may differ. The draft C++ standard defines undefined behavior as follows:

behavior for which this International Standard imposes no requirements

and includes the following note (emphasis mine):

Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed.

The draft C11 standard has similar language.

Proper signed overflow check

Your check is not the proper way to guard against signed integer overflow, you need to check before you execute the operation and not execute the operation if it would cause overflow. Cert has a good reference on how to prevent signed integer overflow for the various operations. For the addition case it recommends the following:

#include <limits.h>

void f(signed int si_a, signed int si_b) {
  signed int sum;
  if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
      ((si_b < 0) && (si_a < (INT_MIN - si_b)))) {
    /* Handle error */
  } else {
    sum = si_a + si_b;
  }

If we plug this code into godbolt we can see that the checks are elided which is he behavior we would expect.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • FWIW, it's optimized away with `g++ -O3`. The [generated code](http://goo.gl/ElvE9m) is identical to the code generated for `int f(int n) { return n < 0? 0 : n + 100; }`. – T.C. Aug 06 '14 at 05:24
  • @T.C. I did not get a chance to add the test results from godbolt earlier but that is indeed the case. – Shafik Yaghmour Aug 06 '14 at 13:20
  • @JeremyFriesner I agree with you as I stated in [my comment here](http://stackoverflow.com/questions/24296571/why-does-this-loop-produce-warning-iteration-3u-invokes-undefined-behavior-an/24297811#comment37568135_24297811) which involves a similar issue. Although in that case the compiler did warn of undefined behavior but in many cases it does not which I don't fully understand. – Shafik Yaghmour Aug 07 '14 at 04:13
  • I wonder how the supposed efficiency gains from allowing overflow to yield unconstrained behavior compare with the costs of having to use code like the above to perform what would otherwise be simply "return si_a+si_b;"? – supercat May 09 '16 at 20:08