0

I have a question regarding conversion of integers:

#include <iostream>
#include <cstdint>
using namespace std;

int main()
{
    
    int N,R,W,H,D;
    uint64_t sum = 0;
    uint64_t sum_2 = 0;
    cin >> W >> H >> D;
    sum += static_cast<uint64_t>(W) * H * D * 100;
    sum_2 += W * H * D * 100;
    
    cout << sum << endl;
    cout << sum_2 << endl;
    return 0;
}

I thought, that sum should be equal to sum_2, because uint64_t type is bigger than int type and during arithmetic operations compiler chooses bigger type(which is uint64_t). So by my understanding, sum_2 must have uint64_t type. But it has int type.

Can you explain my why sum_2 was converted to int? Why didn't it stay uint64_t?

Daniel Yefimov
  • 860
  • 1
  • 10
  • 24
  • 5
    The `W * H * D * 100` part is perform with `int` types because all the types in that sequence are `int` - only the assignment converts the result of this calculation into an `uint64_t` – UnholySheep Aug 01 '22 at 21:28
  • 1
    `sum_2` **does** have `uint64_t` type. But for the reason UnholySheep gave, you only fed `int` values into it. – Ben Voigt Aug 01 '22 at 21:51

2 Answers2

3

Undefined behavior signed-integer overflow/underflow, and well-defined behavior unsigned-integer overflow/underflow, in C and C++

If I enter 200, 300, and 358 for W, H, and D, I get the following output, which makes perfect sense for my gcc compiler on a 64-bit Linux machine:

2148000000
18446744071562584320

Why does this make perfect sense?

Well, the default type is int, which is int32_t for the gcc compiler on a 64-bit Linux machine, and its max value is 2^32/2-1 = 2147483647, and its min value is -2147483648. The line sum_2 += W * H * D * 100; does int arithmetic since that's the type of each variable there, 100 included, and no explicit cast is used. So, after doing int arithmetic, it then implicitly casts the int result into a uint64_t as it stores the result into the uint64_t sum_2 variable. The int arithmetic on the right-hand side prior to that point, however, results in 2148000000, which has undefined behavior signed integer overflow over the top of the max int value and back down to the min int value and up again.

Even though according to the C and C++ standards, signed integer overflow or underflow is undefined behavior, in the gcc compiler, I know that signed integer overflow happens to roll over to negative values if it is not optimized out. This, by default, is still "undefined behavior", and a bug, however, and must not be relied upon by default. See notes below for details and information on how to make this well-defined behavior via a gcc extension. Anyway, 2148000000 - 2147483647 = 516353 up-counts, the first of which causes roll-over. The first count up rolls over to the min int32_t value of -2147483648, and the next (516353 - 1 = 516352) counts go up to -2147483648 + 516352 = -2146967296. So, the result of W * H * D * 100 for the inputs above is now -2146967296, based on undefined behavior. Next, that value is implicitly cast from an int (int32_t in this case) to a uint64_t in order to store it from an int (int32_t in this case) into the uint64_t sum_2 variable, resulting in well-defined behavior unsigned integer underflow. You start with -2146967296. The first down-count underflows down to uint64_t max, which is 2^64-1 = 18446744073709551615. Now subtract the remaining 2146967296 - 1 = 2146967295 counts from that and you get 18446744073709551615 - 2146967295 = 18446744071562584320, just as shown above!

Voila! With a little compiler and hardware architecture understanding, and some expected but undefined behavior, the result is perfectly explainable and makes sense!

To easily see the negative value, add this to your code:

int sum_3 = W*H*D*100;
cout << sum_3 << endl;  // output: -2146967296

Notes

  1. Never intentionally leave undefined behavior in your code. That is known as a bug. You do not have to write ISO C++, however! If you can find compiler documentation indicating a certain behavior is well-defined, that's ok, so long as you know you are writing in the g++ language and not the C++ language, and don't expect your code to work the same across compilers. Here is an example where I do that: Using Unions for "type punning" is fine in C, and fine in gcc's C++ as well (as a gcc [g++] extension). I'm generally okay with relying on compiler extensions like this. Just be aware of what you're doing is all.

  2. @user17732522 makes a great point in the comments here:

    "in the gcc compiler, I know that signed integer overflow happens to roll over to negative values.": That is not correct by-default. By-default GCC assumes that signed overflow does not happen and applies optimizations based on that. There is the -fwrapv and/or -fno-strict-overflow flag to enforce wrapping behavior. See https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Code-Gen-Options.html#Code-Gen-Options.

    Take a look at that link above (or even better, this one, to always point to the latest gcc documentation instead of the documentation for just one version: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options). Even though signed-integer overflow and underflow is undefined behavior (a bug!) according to the C and C++ standards, gcc allows, by extension, to make it well-defined behavior (not a bug!) so long as you use the proper gcc build flags. Using -fwrapv makes signed-integer overflow/underflow well-defined behavior as a gcc extension. Additionally, -fwrapv-pointer allows pointers to safely overflow and underflow when used in pointer arithmetic, and -fno-strict-overflow applies both -fwrapv and -fwrapv-pointer. The relevant documentation is here: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options (emphasis added):

    These machine-independent options control the interface conventions used in code generation.

    Most of them have both positive and negative forms; the negative form of -ffoo is -fno-foo.

    ...

    • -fwrapv
      This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation. This flag enables some optimizations and disables others. The options -ftrapv and -fwrapv override each other, so using -ftrapv -fwrapv on the command-line results in -fwrapv being effective. Note that only active options override, so using -ftrapv -fwrapv -fno-wrapv on the command-line results in -ftrapv being effective.

    • -fwrapv-pointer
      This option instructs the compiler to assume that pointer arithmetic overflow on addition and subtraction wraps around using twos-complement representation. This flag disables some optimizations which assume pointer overflow is invalid.

    • -fstrict-overflow
      This option implies -fno-wrapv -fno-wrapv-pointer and when negated [as -fno-strict-overflow] implies -fwrapv -fwrapv-pointer.

    So, relying on signed-integer overflow or underflow withOUT using the proper gcc extension flags above is undefined behavior, and therefore a bug, and can not be safely relied upon! It may be optimized out by the compiler and not work reliably as intended without the gcc extension flags above.

My test code

Here is my total code I used for some quick checks to write this answer. I ran it with the gcc/g++ compiler on a 64-bit Linux machine. I did not use the -fwrapv or -fno-strict-overflow flags, so all signed integer overflow or underflow demonstrated below is undefined behavior, a bug, and cannot be relied upon safely without those gcc extension flags. The fact that it works is circumstantial, as the compiler could, by default, choose to optimize out the overflows in unexpected ways.

If you run this on an 8-bit microcontroller such as an Arduino Uno, you'd get different results since an int is a 2-byte int16_t by default, instead! But, now that you understand the principles, you could figure out the expected result. (Also, I think 64-bit values don't exist on that architecture, so they become 32-bit values).

#include <iostream>
#include <cstdint>
using namespace std;

int main()
{
    
    int N,R,W,H,D;
    uint64_t sum = 0;
    uint64_t sum_2 = 0;
    // cin >> W >> H >> D;
    W = 200;
    H = 300;
    D = 358;
    sum += static_cast<uint64_t>(W) * H * D * 100;
    sum_2 += W * H * D * 100;
    
    cout << sum << endl;
    cout << sum_2 << endl;
    
    int sum_3 = W*H*D*100;
    cout << sum_3 << endl;
    
    sum_2 = -1; // underflow to uint64_t max
    cout << sum_2 << endl;
    
    sum_2 = 18446744073709551615ULL - 2146967295;
    cout << sum_2 << endl;
    
    return 0;
}
Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
  • 3
    "_in the gcc compiler, I know that signed integer overflow happens to roll over to negative values._": That is not correct by-default. By-default GCC assumes that signed overflow does not happen and applies optimizations based on that. There is the `-fwrapv` and/or `-fno-strict-overflow` flag to enforce wrapping behavior. See https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Code-Gen-Options.html#Code-Gen-Options. – user17732522 Aug 01 '22 at 22:42
  • @user17732522, thank you for that information! I've substantially updated my answer to incorporate it. I have never used the gcc flags to make signed-integer overflow and underflow well-defined behavior before. – Gabriel Staples Aug 02 '22 at 16:44
3

Just a short version of @Gabriel Staples good answer.

"and during arithmetic operations compiler chooses bigger type(which is uint64_t)"

There is no uin64_t in W * H * D * 100, just four int. After this multiplication, the int product (which overflowed and is UB) is assigned to an uint64_t.

Instead, use 100LLU * W * H * D to perform a wider unsigned multiplication.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256