Undefined behavior signed-integer overflow/underflow, and well-defined behavior unsigned-integer overflow/underflow, in C and C++
If I enter 200
, 300
, and 358
for W
, H
, and D
, I get the following output, which makes perfect sense for my gcc compiler on a 64-bit Linux machine:
2148000000
18446744071562584320
Why does this make perfect sense?
Well, the default type is int
, which is int32_t
for the gcc compiler on a 64-bit Linux machine, and its max value is 2^32/2-1 = 2147483647, and its min value is -2147483648. The line sum_2 += W * H * D * 100;
does int
arithmetic since that's the type of each variable there, 100
included, and no explicit cast is used. So, after doing int
arithmetic, it then implicitly casts the int
result into a uint64_t
as it stores the result into the uint64_t sum_2
variable. The int
arithmetic on the right-hand side prior to that point, however, results in 2148000000
, which has undefined behavior signed integer overflow over the top of the max int
value and back down to the min int
value and up again.
Even though according to the C and C++ standards, signed integer overflow or underflow is undefined behavior, in the gcc
compiler, I know that signed integer overflow happens to roll over to negative values if it is not optimized out. This, by default, is still "undefined behavior", and a bug, however, and must not be relied upon by default. See notes below for details and information on how to make this well-defined behavior via a gcc extension. Anyway, 2148000000 - 2147483647 = 516353 up-counts, the first of which causes roll-over. The first count up rolls over to the min int32_t
value of -2147483648, and the next (516353 - 1 = 516352) counts go up to -2147483648 + 516352 = -2146967296. So, the result of W * H * D * 100
for the inputs above is now -2146967296
, based on undefined behavior. Next, that value is implicitly cast from an int
(int32_t
in this case) to a uint64_t
in order to store it from an int
(int32_t
in this case) into the uint64_t sum_2
variable, resulting in well-defined behavior unsigned integer underflow. You start with -2146967296. The first down-count underflows down to uint64_t
max, which is 2^64-1 = 18446744073709551615. Now subtract the remaining 2146967296 - 1 = 2146967295 counts from that and you get 18446744073709551615 - 2146967295 = 18446744071562584320
, just as shown above!
Voila! With a little compiler and hardware architecture understanding, and some expected but undefined behavior, the result is perfectly explainable and makes sense!
To easily see the negative value, add this to your code:
int sum_3 = W*H*D*100;
cout << sum_3 << endl; // output: -2146967296
Notes
Never intentionally leave undefined behavior in your code. That is known as a bug. You do not have to write ISO C++, however! If you can find compiler documentation indicating a certain behavior is well-defined, that's ok, so long as you know you are writing in the g++ language and not the C++ language, and don't expect your code to work the same across compilers. Here is an example where I do that: Using Unions for "type punning" is fine in C, and fine in gcc's C++ as well (as a gcc [g++] extension). I'm generally okay with relying on compiler extensions like this. Just be aware of what you're doing is all.
@user17732522 makes a great point in the comments here:
"in the gcc compiler, I know that signed integer overflow happens to roll over to negative values.": That is not correct by-default. By-default GCC assumes that signed overflow does not happen and applies optimizations based on that. There is the -fwrapv
and/or -fno-strict-overflow
flag to enforce wrapping behavior. See https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Code-Gen-Options.html#Code-Gen-Options.
Take a look at that link above (or even better, this one, to always point to the latest gcc documentation instead of the documentation for just one version: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options). Even though signed-integer overflow and underflow is undefined behavior (a bug!) according to the C and C++ standards, gcc allows, by extension, to make it well-defined behavior (not a bug!) so long as you use the proper gcc build flags. Using -fwrapv
makes signed-integer overflow/underflow well-defined behavior as a gcc extension. Additionally, -fwrapv-pointer
allows pointers to safely overflow and underflow when used in pointer arithmetic, and -fno-strict-overflow
applies both -fwrapv
and -fwrapv-pointer
. The relevant documentation is here: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options (emphasis added):
These machine-independent options control the interface conventions used in code generation.
Most of them have both positive and negative forms; the negative form of -ffoo
is -fno-foo
.
...
-fwrapv
This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation. This flag enables some optimizations and disables others. The options -ftrapv
and -fwrapv
override each other, so using -ftrapv -fwrapv
on the command-line results in -fwrapv
being effective. Note that only active options override, so using -ftrapv -fwrapv -fno-wrapv
on the command-line results in -ftrapv
being effective.
-fwrapv-pointer
This option instructs the compiler to assume that pointer arithmetic overflow on addition and subtraction wraps around using twos-complement representation. This flag disables some optimizations which assume pointer overflow is invalid.
-fstrict-overflow
This option implies -fno-wrapv -fno-wrapv-pointer
and when negated [as -fno-strict-overflow
] implies -fwrapv -fwrapv-pointer
.
So, relying on signed-integer overflow or underflow withOUT using the proper gcc extension flags above is undefined behavior, and therefore a bug, and can not be safely relied upon! It may be optimized out by the compiler and not work reliably as intended without the gcc extension flags above.
My test code
Here is my total code I used for some quick checks to write this answer. I ran it with the gcc/g++ compiler on a 64-bit Linux machine. I did not use the -fwrapv
or -fno-strict-overflow
flags, so all signed integer overflow or underflow demonstrated below is undefined behavior, a bug, and cannot be relied upon safely without those gcc extension flags. The fact that it works is circumstantial, as the compiler could, by default, choose to optimize out the overflows in unexpected ways.
If you run this on an 8-bit microcontroller such as an Arduino Uno, you'd get different results since an int
is a 2-byte int16_t
by default, instead! But, now that you understand the principles, you could figure out the expected result. (Also, I think 64-bit values don't exist on that architecture, so they become 32-bit values).
#include <iostream>
#include <cstdint>
using namespace std;
int main()
{
int N,R,W,H,D;
uint64_t sum = 0;
uint64_t sum_2 = 0;
// cin >> W >> H >> D;
W = 200;
H = 300;
D = 358;
sum += static_cast<uint64_t>(W) * H * D * 100;
sum_2 += W * H * D * 100;
cout << sum << endl;
cout << sum_2 << endl;
int sum_3 = W*H*D*100;
cout << sum_3 << endl;
sum_2 = -1; // underflow to uint64_t max
cout << sum_2 << endl;
sum_2 = 18446744073709551615ULL - 2146967295;
cout << sum_2 << endl;
return 0;
}