2

test.c

#include<limits.h>
int main()
{
    int a=INT_MAX-1;
    if(a+100<a)
    {
        printf("overflow\n");
        return 1;
    }
    printf("%i\n",a+100);
    return 0;
}

on running this code in GCC compiler using optimisation levels during compilation,why do I get different outputs?

ON USING gcc test.c THE OUTPUT IS overflow

but on using gcc -o2 test.c the output is -2147483550

Can someone please explain why is this happening and I want the compiler to detect the overflow in all cases,what changes should I make in the code

  • 3
    `(INT_MAX - 1) + 100` is UB. UB can manifest itself during compilation. Apparently your compiler decided to produce (*UB flawed*) executables -- one with the `if`, another without. – pmg Jan 21 '21 at 09:34
  • 1
    There is no way for `a+100 – Paul Hankin Jan 21 '21 at 09:53
  • This question investigates ways to detect overflow in a robust way: https://stackoverflow.com/questions/199333/how-do-i-detect-unsigned-integer-multiply-overflow – Paul Hankin Jan 21 '21 at 09:55

4 Answers4

3

One way to detect it is by compiling with -fsanitize=undefined

On your code:

$ gcc -Wstrict-overflow k.c -fsanitize=undefined
$ ./a.out 
k.c:6:9: runtime error: signed integer overflow: 2147483646 + 100 cannot be represented in type 'int'
overflow
klutt
  • 30,332
  • 17
  • 55
  • 95
2

Signed integer overflows are undefined behavior bugs and usually cannot be detected by the compiler. The different buggy results you get depending on optimization level is a perfect example of why we should never write code depending on undefined behavior, anything could happen.

The correct way to write such a program is to either properly check for overflows, see for example How do I detect unsigned integer multiply overflow?. In this case simply check if
a <= INT_MAX-100 before doing a+100. Or alternatively, cast everything to unsigned types that don't overflow but "wrap-around" in well-defined ways.

Lundin
  • 195,001
  • 40
  • 254
  • 396
2

This answer explains why compiler optimization behaves this way. In short, it is a consequence of the fact that the transformation is allowed by the rules of the C standard and is desired because it provides better performance for correct programs (programs that do not use behavior that is not defined).

An effect of optimization by GCC is to apply transformations to the program that are valid logical deductions that ignore undefined behavior.

In its default mode, GCC generates code that largely literally follows the source code. For if (a+100 < a), GCC generates code like:

  1. Load a into register r0.
  2. Add 100 to register r0.
  3. Compare register r0 to a.

Thus, GCC actually performs the operations in the expression. Because a has the value INT_MAX-1, and the hardware wraps when 100 is added, the result is less than a, and the comparison evaluates to true so the “then” statement of the if is executed. (Testing on Godbolt shows this occurs with GCC 7.3 and prior. In GCC 8.1, the behavior with default settings appears to have changed.)

When optimization is requested, GCC creates a semantic model of the program, analyzes it, and applies transformations that produce code that is equivalent within the rules of the C standard (or other language it is compiling).

One mathematical truth for real numbers is that if h is not negative, then a+h < a is always false. While this statement is true for real numbers, it is not true for unsigned arithmetic, because unsigned arithmetic wraps. However, it is true for int arithmetic if overflow does not occur.

Now, if overflow does occur, the behavior is not defined by the C standard. We then have two possibilities:

  • If overflow does not occur, the rule is mathematically valid, and using it results in a transformed program that is equivalent within the rules of the standard.
  • If overflow does occur, the behavior of the program is not defined by the C standard, so the transformed program is also allowed by the rules of the C standard.

This means we can always apply the rule and ignore whether overflow occurs or not, and we will be conforming to the C standard.

But why do we want to do this? We have just allowed any arbitrary transformation of our program if overflow occurs. Well, there is a good result from this. Sometimes, in the middle of a program, we might find code that, by itself, could have overflow. If we restrained ourselves from performing this optimization, the resulting program would be slower, since it was not optimized. But we can add an assumption: The programmer designed this program correctly. Whatever particular situation we are in in the middle of this program, the programmer should have designed the control flow so that the overflow situation does not happen here. So even though overflow could happen at this spot if this routine were called with, say, x equal to some particular value, the programmer should have written the program so the routine is never called like that.

Therefore a choice was made to assume that, for the purposes of applying optimizations, overflow does not happen. In consequence, GCC uses the rule (or something similar) that, for signed arithmetic, if h is not negative, then a+h < a is always false.

So, when GCC sees the code a+100 < a and is optimizing, it replaces this code with 0, meaning false. Then it further optimizes the if and removes the “then” statement completely.

Of course, you might ask, well, int a=INT_MAX-1 is just above a+100<a, cannot the compiler see that and know a+100<a is “true” in this case? Theoretically, maybe. But computers are not intuitive and do not always look at the whole situation. The compiler may know a is a constant with a particular value, and it may at times evaluate expressions like a+100<a at compile time. But it is built with thousands of rules, and it applies them mechanically in some order resulting from its programming. It is not easy to design software to step back and look at the big picture. Once it finds its optimization to change a+100<a to 0, it applies it, and the change is done.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
0

Instead of invoking an UB it is enough to change to condition to test it without the UB.

int main(void)
{
    int a=INT_MAX-1;
    if(a > INT_MAX - 100)
    {
        printf("overflow\n");
        return 1;
    }
    printf("%i\n",a+100);
    return 0;
}

Answering the question of why. In this trivial example the compiler is not generating the code comparing the values of the variable a (actually it is optimized out) only directly printed the output. It shows that internally to evaluate this condition, the compiler uses longer integer types where this condition is false. If you force the compiler to do not optimize out the variable a the behaviour changes (but of course it is still UB and the behaviour is not guaranteed).

#include<limits.h>
#include <stdio.h>
int main()
{
    volatile int a=INT_MAX-1;
    if(a+100<a)
    {
        printf("overflow\n");
        return 1;
    }
    printf("%i\n",a+100);
    return 0;
}

https://godbolt.org/z/hcz9Yq

0___________
  • 60,014
  • 4
  • 34
  • 74
  • Where/how do you think your example shows the compiler is using longer integer types where the condition is false? In the Godbolt link you provide, not only is 32-bit arithmetic used, but the program behavior is to print “overflow” and return 1, showing the condition evaluated as true, not false. – Eric Postpischil Jan 21 '21 at 12:18
  • @EricPostpischil it is an example for the second part of the answer staring from `If you force the compiler to do not optimize out the variable a`, not for the first part. The first part example is here: https://godbolt.org/z/TeYa9q. But I think it was quiteasy to spot considering the `volatile` keyword. But if someone is looking for the nitpick ... – 0___________ Jan 21 '21 at 13:46
  • If it applies to the first code, then it is not the case that “It shows that internally to evaluate this condition, the compiler uses longer integer types where this condition is false”: That has not been shown. It is a conjecture without evidence or documentation. Further, it does not make sense. If the compiler wants to know what some `int` expression evaluates to, evaluating it with non-`int` arithmetic is not a correct way to produce a result… – Eric Postpischil Jan 21 '21 at 13:55
  • … It is a better fit to how compilers behave that the compiler either evaluates `a+100 < a` using `int` arithmetic in generated code and therefore gets “true” and produces the “overflow” output, as seen in the OP’s execution without optimization, or that the compiler optimizes the arithmetic with the assumption that overflow does not occur (or, equivalently, may be ignored) and therefore gets “false” and generates code that does not produce the “overflow” output, as seen in the OP’s execution with optimization. Wider arithmetic is never needed nor appropriate. – Eric Postpischil Jan 21 '21 at 13:57
  • @EricPostpischil `with non-int arithmetic is not a correct` Yes indeed. Then report the bug. – 0___________ Jan 21 '21 at 13:57
  • There is no evidence of a bug because there is no evidence non-`int` arithmetic was used. – Eric Postpischil Jan 21 '21 at 13:58
  • Without the optimization compiler is not evaluating it and the rest of your comment makes no sense – 0___________ Jan 21 '21 at 13:58
  • Without optimization, the compiler generates code that evaluates the expression, then the program runs and evaluates it, and the result is “true” (1) and the program outputs “overflow”. – Eric Postpischil Jan 21 '21 at 13:59
  • @EricPostpischil it very unlikely gcc to use internally different int format than two complement - so this condition has to be evaluated using larger int (when optimizing) – 0___________ Jan 21 '21 at 14:00
  • Generates the code - not evalutating the expression during the compilation. Do you see the difference?> – 0___________ Jan 21 '21 at 14:01
  • No, it does not have to be evaluated using large `int`. That is because the C semantics do not require any sort of “correct” arithmetic be used other than `int`. The compiler has no obligation to use larger arithmetic to get a result different than `int` arithmetic would provide. – Eric Postpischil Jan 21 '21 at 14:01
  • @EricPostpischil `No, it does not have to be evaluated using large int` where did I write that it has to? But with -Ox where x > 0 it optimizes out the variable `a` assuming the condition is false. – 0___________ Jan 21 '21 at 14:03
  • Again, I described two scenarios: One, no optimization is requested. The compiler generates code to evaluate the expression. That code uses `int` arithmetic. The program runs. It evaluates the expression. The hardware wraps the result. The expression evaluates as true. The program prints “overflow”. Two, optimization is requested. The compiler applies a rule that transforms `a+h < a` to false, without evaluating it. The compiler removes the dead “then” branch and generates code without it. The program executes and does not print “overflow.” … – Eric Postpischil Jan 21 '21 at 14:03
  • … These two scenarios provide a complete explanation for the observed results that is consistent with the C standard and the general goals and design of GCC. And so they demonstrate it is not necessary to use wider-than-`int` arithmetic to get the observed results. Therefore, it has not been shown that wider-than-`int` arithmetic is used. – Eric Postpischil Jan 21 '21 at 14:04
  • `a+h < a to false, without evaluating it` Now you are assuming something. ***`It is a conjecture without evidence or documentation`*** – 0___________ Jan 21 '21 at 14:05
  • No, I did not assume it. I offered two scenarios that explain the observed results. I have not asserted in these comments that they are necessarily true. The point is that there are at least two explanations for how the program might not take the “then” branch: say explanation E0 and E1. Yours is E0, in which wider arithmetic is used. Another is E1. You have not shown that E1 does not happen, and therefore you have not shown that E0 does happen. The statement in the answer that “It shows…” is false; it does not show that. – Eric Postpischil Jan 21 '21 at 14:09
  • Re “No, it does not have to be evaluated using large int where did I write that it has to?”: In the answer, you state “It shows that internally to evaluate this condition, the compiler uses longer integer types…”. This is false; it is not shown. – Eric Postpischil Jan 21 '21 at 14:09