5

I have a legacy codebase which we are trying to migrate from devtoolset-4 to devtoolset-7. I noticed an interesting behaviour regarding overflow of signed integers (int64_t, to be specific).

There is a code snippet which is used to detect integer overflow while multiplying a big set of integers:

// a and b are int64_t
int64_t product = a * b; 
if (b != 0 && product / b != a) {
    // Overflow
}

This code was working fine with devtoolset-4. However, with devtoolset-7, overflow is never being detected.

For eg: When a = 83802282034166 and b = 98765432, the product becomes -5819501405344925872 (clearly the value has overflown).

But product / b results in value equal to a (83802282034166). Hence the if condition never becomes true. Its value should have been computed based on the overflown (negative) product value: -5819501405344925872 / 98765432 = -58922451788

Ironically, the Maths is correct but it is causing anomalous behaviour with regards to devtoolset-4.

  • Could the compiler be caching the value (and not re-evaluating it) resulting in this behaviour?
  • Or does compiler optimization converts statement product / b != a to product != a * b and reaches the same overflown value (or maybe just skips the computation based on the above statement where product = a * b)?

I understand that signed integer overflow is an 'undefined behaviour' in C++ and so the compiler behaviour could change across implementations. But could someone help me make sense of the above behaviour?

Note: the g++ versions in devtoolset-4 and devtoolset-7 are g++ (GCC) 5.2 and g++ (GCC) 7.2.1, respectively.

Still-InBeta
  • 383
  • 2
  • 9
  • 4
    `signed` integer overflow is undefined in c++. You cannot reliably detect it after the fact since you would implicitly be in undefined behavior.. – François Andrieux Mar 28 '18 at 15:35
  • A correct test checks **before** multiplying, in order to avoid undefined behavior. Something like `if (std::numeric_limits::max() / b < a) { /* error */ }`. – Pete Becker Mar 28 '18 at 15:41
  • @PeteBecker Yes Pete that is what I have modified the code to. I was just trying to understand the above behaviour. – Still-InBeta Mar 28 '18 at 15:43
  • To understand the behavior, you'd want to look at the generated assembly code to see what the compiler has done with it. – 1201ProgramAlarm Mar 28 '18 at 15:44
  • 3
    `product == a * b`, so `product / b != a` can be optimized to -> `a * b / b != a` -> `a * 1 != a` -> `a != a` -> `false`. – eerorika Mar 28 '18 at 15:45
  • 1
    The OP states he knows signed integer overflow is UB. Linking to a question whose answers state "this is UB" is not useful. – Yakk - Adam Nevraumont Mar 28 '18 at 15:45
  • 2
    If you are interested in understanding what compilers output for a given piece of code, you can use godbolt.org. For example I've [tried your snippet](https://godbolt.org/g/XkjseD) and it seems the condition is entirely optimized out by gcc 7.3 with -O2. For an explanation of why undefined behavior does what it does you would need to specify exactly which compiler you are using, which version you are using, which flags are using and the answer could still depend on other factors. – François Andrieux Mar 28 '18 at 15:50

7 Answers7

6

Signed integer overflow is undefined behavior in C++.

This means the optimizer can assume it never happens. a*b/b is a, period.

Modern compilers do static single assignment based optimization.

// a and b are int64_t
int64_t product = a * b;
if (b != 0 && product / b != a) {
  // Overflow
}

becomes:

const int64_t __X__ = a * b; 
const bool __Y__ = b != 0;
const int64_t __Z__ = __X__ / b;
const int64_t __Z__ = a*b / b;
const int64_t __Z__ = a;

if (__Y__ && __Z__ != a) {
  // Overflow
}

which evaluates to

if (__Y__ && false) {
  // Overflow
}

clearly, as __Z__ is a and a!=a is false.

int128_t big_product = a * b; 

work with big_product and detect overflow there.

SSA permits the compiler to realize things like (a+1)>a is always true, which can simplify many loops and optimization cases. That fact relies on the fact that overflow of signed values is undefiend behavior.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
4

Because signed overflow/underflow are classified as undefined behavior, compilers are allowed to cheat and assume it can't happen (this came up during a Cppcon talk a year or two ago, but I forget the talk off the top of my head). Because you're doing the arithmetic and then checking the result, the optimizer gets to optimize away part of the check.

This is untested code, but you probably want something like the following:

if(b != 0) {
    auto max_a = std::numeric_limits<int64_t>::max() / b;
    if(max_a < a) {
        throw std::runtime_error{"overflow"};
    }
}
return a * b;

Note that this code doesn't handle underflow; if a * b can be negative, this check won't work.

Per Godbolt, you can see your version has the check completely optimized away.

Stephen Newell
  • 7,330
  • 1
  • 24
  • 28
  • Thanks @stephen. The Godbolt link answers my question. So it really is due to compiler optimization. It would also be great if you can give me the Cppcon talk link. – Still-InBeta Mar 28 '18 at 15:57
  • Shouldn't it be `a > max_a`? – Aconcagua Mar 28 '18 at 16:07
  • 3
    and negative isn't handled too (both `-big * -big` and `big * -big`) – Jarod42 Mar 28 '18 at 16:09
  • @Aconcagua - Pretty sure I have the check correct. `max_a` should be the largest possible value `int64_t` can store, so there's no way `a` can ever be larger. – Stephen Newell Mar 28 '18 at 17:11
  • @Jarod42 - Good catch on negatives. I'll leave that as an exercise for the reader :) – Stephen Newell Mar 28 '18 at 17:12
  • @StephenNewell `max_a = max_i64 / b`, so `max_a` is the maximum value that can be multiplied with `b`... Or did I read it wrong that returning 0 shall denote the overflow? Possibly better `throw` anyway? – Aconcagua Mar 28 '18 at 17:17
  • @StephenNewell "negatives -> excercise" - suppose would be a good idea adding it to the answer itself, might be overseen within the comments otherwise... – Aconcagua Mar 28 '18 at 17:18
  • @Aconcagua: You're right, I'm an idiot. This is why code reviews exist :). Agree on the `throw`, and somehow didn't think of that when I wrote the answer. – Stephen Newell Mar 28 '18 at 17:19
  • @Still-InBeta - I don't have the bandwidth right now to go through all the old Cppcon talks (although it sounds fun). If memory serves me correctly, this is the correct talk: https://www.youtube.com/watch?v=g7entxbQOCc (could also be this one though: https://www.youtube.com/watch?v=yG1OZ69H_-o). If somebody wants to do the leg work and verify/correct the link, I'm happy to edit the answer. – Stephen Newell Mar 28 '18 at 17:31
  • has anyone observed that undefined behavior on any compiler? – TakeMeAsAGuest Mar 24 '19 at 18:23
  • @TakeMeAsAGuest - Check Godbolt: https://godbolt.org/z/4DdDlT. Both gcc and clang optimize away the overflow check entirely, but only with signed types (unsigned over/underflow is defined). – Stephen Newell Mar 24 '19 at 21:53
  • i dont know assembly much, so what does cpu itself do if it overflows? say addition or multiplication? sorry i dont have c++ compiler right now and also want to let readers learn smt either. – TakeMeAsAGuest Mar 24 '19 at 22:17
  • @TakeMeAsAGuest - The signed versions of the function have three instructions: `mov`, `imul`, `ret`; there's no branching. The unsigned versions have many instructions, including `test` and `je` (jump if equal); if you hover over the instruction, you'll get a brief explanation. As to what the CPU does, that depends on the specific CPU. If you need specifics about an architecture, then you'll need to ask a new question so somebody who knows can answer you. – Stephen Newell Mar 24 '19 at 22:44
  • i was curious about x86's ofc but ok – TakeMeAsAGuest Mar 24 '19 at 23:48
4

With the knowledge thatproduct == a * b, the compiler/optimizer can take following optimization steps:

b != 0 && product / b != a
b != 0 && a * b / b != a
b != 0 && a * 1 != a
b != 0 && a != a
b != 0 && false
false

The optimizer can choose to remove the branch entirely.


I understand that signed integer overflow is an 'undefined behaviour' in C++ and so the compiler behaviour could change across implementations. But could someone help me make sense of the above behaviour?

You may know that signed integer overflow is UB, but I suppose you haven't yet grasped what UB really means. UB doesn't need to, and often doesn't make sense. This case seems straight forward though.

eerorika
  • 232,697
  • 12
  • 197
  • 326
0

could someone help me make sense of the above behaviour?

Signed integer overflow has undefined behaviour in C++. This implies that you cannot reliably detect it, and that code which contains signed integer overflow can do anything.


If you want to detect whether an operation will result in a signed integer overflow or not, you need to do that before the overflow happens, preventing UB from taking place.

Vittorio Romeo
  • 90,666
  • 33
  • 258
  • 416
0

Signed integer overflow is undefined behavior. This is different from unsigned int(all unsigned ints). More info on this here

As a side note, people noticed that using int instead of unsigned int increases performance (see here) since compiler don't deal with overflow behavior.

Raxvan
  • 6,257
  • 2
  • 25
  • 46
0

if you are worried about integer overflows it's not better to use a arbitrary precision integer library - with this you can increase your size types to 128bits and don't worry about it.

https://gmplib.org/

Tomaz Canabrava
  • 2,320
  • 15
  • 20
0

you can read this documentation it might be useful to you as if any problem faced me in variables and data types i go directly to read it : http://www.cplusplus.com/doc/tutorial/variables/