3

I am trying to understand the meaning of the statement:

(int)(unsigned)-1 == -1;

To my current understanding the following things happen:

  1. -1 is a signed int and is casted to unsigned int. The result of this is that due to wrap-around behavior we get the maximum value that can be represented by the unsigned type.

  2. Next, this unsigned type maximum value that we got in step 1, is now casted to signed int. But note that this maximum value is an unsigned type. So this is out of range of the signed type. And since signed integer overflow is undefined behavior, the program will result in undefined behavior.

My questions are:

  1. Is my above explanation correct? If not, then what is actually happening.
  2. Is this undefined behavior as i suspected or implementation defined behavior.

PS: I know that if it is undefined behavior(as opposed to implementation defined) then we cannot rely on the output of the program. So we cannot say whether we will always get true or false.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Jason
  • 36,170
  • 5
  • 26
  • 60
  • You can check this more robustly with `constexpr` or `consteval`: https://gcc.godbolt.org/z/7czT3KEY5 – chris Jan 21 '22 at 06:07
  • This question could _probably_ be tagged `c` as well. Especially considering you're using the verboten C style cast. – Roflcopter4 Jan 21 '22 at 06:26
  • @Roflcopter4 Yes using `static_cast` instead of c-style cast will have the same affect on my program. I was thinking of using static cast in my question but then i thought that if i understood what happens in c-style cast i will automatically understand the static cast case. So didn't bother to change my program to use static case explicitly. – Jason Jan 21 '22 at 06:39
  • @chris Thankyou can you explain in more detail in an answer. – Jason Jan 21 '22 at 06:40
  • @JasonLiam I am fully aware that static_cast will be precisely equivalent in this case. I only mentioned it to back up the idea that you could tag this question as being relevant in `c`, since you wrote essentially C code. – Roflcopter4 Jan 21 '22 at 06:44
  • @Roflcopter4 Yes i know that you know that they(c-style cast and static cast in my program) are equivalent. I just didn't want to tag 2 languages simultaneously to reduce chaos in the answers. – Jason Jan 21 '22 at 07:22
  • @JasonLiam, My comment isn't really an answer, just a trick. Most forms of undefined behaviour are required to produce a compiler error if encountered during forced constant evaluation, so slapping them in a `constexpr` function and forcing it to evaluate at compile-time will catch most things. The parameter is there to appease the rule that at least one path through the function must be valid at compile-time, and that rule doesn't require a diagnostic, so rather than relying on the compiler to catch that, I produce a valid function and force constant evaluation of the code in question. – chris Jan 21 '22 at 18:32

2 Answers2

8

Cast to unsigned int wraps around, this part is legal.

Out-of-range cast to int is legal starting from C++20, and was implementation-defined before (but worked correctly in practice anyway). There's no UB here.

The two casts cancel each other out (again, guaranteed in C++20, implementation-defined before, but worked in practice anyway).

Signed overflow is normally UB, yes, but that only applies to overflow caused by a computation. Overflow caused by a conversion is different.

cppreference

If the destination type is signed, the value does not change if the source integer can be represented in the destination type. Otherwise the result is:

(until C++20) implementation-defined

(since C++20) the unique value of the destination type equal to the source value modulo 2n where n is the number of bits used to represent the destination type.

(Note that this is different from signed integer arithmetic overflow, which is undefined).


More specifics on how the conversions work.

Lets's say int and unsigned int occupy N bits.

The values that are representable by both int and unsigned int are unchanged by the conversion. All other values are increased or decreased by 2N to fit into the range.

This conveniently doesn't change the binary representation of the values.

E.g. int -1 corresponds to unsigned int 2N-1 (largest unsigned int value), and both are represented as 11...11 in binary. Similarly, int -2N-1 (smallest int value) corresponds to unsigned int 2N-1 (largest int value + 1).

int:   [-2^(n-1)] ... [-1] [0] [1] ... [2^(n-1)-1]
            |           |   |   |           |
            |           '---|---|-----------|-----------------------.
            |               |   |           |                       |
            '---------------|---|-----------|----------.            |
                            |   |           |          |            |
                            V   V           V          V            V
unsigned int:              [0] [1] ... [2^(n-1)-1] [2^(n-1)] ... [2^n-1]
HolyBlackCat
  • 78,603
  • 9
  • 131
  • 207
  • @JasonLiam Better now? – HolyBlackCat Jan 21 '22 at 07:33
  • @JasonLiam Mm, okay. – HolyBlackCat Jan 21 '22 at 07:55
  • @Jason - The pre-C++20 "implementation defined" part was whether the compiler used two's complement for signed numbers. And they all did - that's why C++20 can require it. – BoP Jan 21 '22 at 10:22
  • @JasonLiam, While this changed with the paper that required two's complement, the answer cites why it was implementation-defined regardless of two's complement. If `int` and `unsigned` have an equal number of bits, then no matter what representation is used, the value `UINT_MAX` (required from `(unsigned)-1`, it's not a `reinterpret_cast`) is never going to fit into an `int`, so the conversion was always implementation-defined. The paper also proposed wrap-around overflow, but went back on that: https://youtu.be/JhUxIVf1qok – chris Jan 21 '22 at 18:46
  • @JasonLiam Presumably the assumption that nothing ever overflows allows more optimizations. – HolyBlackCat Jan 21 '22 at 18:48
  • @HolyBlackCat: An assumption that certain actions may be regarded as equivalent, without regard for whether overflow occurs, allows optimizing transforms that can improve the performance of many real programs without *adversely* affecting their behavior. An assumption that all possible responses to inputs that would cause integer overflow are equally tolerable allows dangerous transforms which may offer huge performance benefits for contrived programs, but generally minimal benefits for practical ones, while undermining their ability to limit the effects of maliciously crafted inputs. – supercat Apr 08 '22 at 17:34
0

Edit: Versions prior to C++ 20 do not require two's complement

It might help understanding the binary representation of -1 which is: 0b1111'1111'1111'1111'1111'1111'1111'1111. I should note, that this is a 4-byte representation of -1. Different machines might represent int differently.

You are casting to unsigned and then back to int which changes how you interpret the 1's and 0's. In decimal (or sometimes called denary), what you're doing is the equivalent of casting it to 4294967295 and then back to -1.

The type of the integer literal is the first type in which the value can fit, from the list of types which depends on which numeric base and which integer-suffix was used

Because -1 fits into int, that is how it's interpreted. See: https://en.cppreference.com/w/cpp/language/integer_literal

this is out of range of the signed type. And since signed integer overflow

This is not integer overflow. This is binary representation. Integer overflow would be:

    int int_max = 2147483647; // this is the maximum integer int on my machine
    int one = 1;             
    
    int overflow = x + s;         // sum is equal to -2147483648

The above is an example integer overflow because the maximum representation

Jarod42
  • 203,559
  • 14
  • 181
  • 302
Happy Jerry
  • 164
  • 1
  • 8
  • 1
    The binary representation of `-1` is `1111...` only for a twos-complement based system. Granted, _all_ current systems are twos-complement, but the C and C++ standards define an abstract language that certainly does not require any particular kind of signed integer. Assuming implementation-defined results begs the question here. In C (unsure about C++) signed integer overflow is undefined (also undefined: integer division by 0 - note that in the theoretical abstract C universe both things are equally logically meaningless). – Roflcopter4 Jan 21 '22 at 06:50
  • 2
    This answer is clearly wrong just like @Shahriar posted a wrong answer(now deleted). I think you should either update this answer to show what is correct or delete it. – Jason Jan 21 '22 at 06:55
  • @Roflcopter4 C++ 20 does require two's complement. I will specify that in the post – Happy Jerry Jan 21 '22 at 06:58
  • @JasonLiam C++ 20 requires two's complement. I edited it the post to reflect that – Happy Jerry Jan 21 '22 at 07:03
  • @HappyJerry I have 1 query from your answer. **1)** Since the language guarantees that `int` is atleast 16-bit so shouldn't `255` be `65535` in your answer? – Jason Jan 21 '22 at 07:13
  • @JasonLiam You're correct. It's 2 am here lol. I'll update my post accordingly. – Happy Jerry Jan 21 '22 at 07:23
  • C++ defines casts in terms of preserving the value, not reinterpreting bits. It's only a coincidence that unsigned -> signed conversion for 2's complement systems doesn't change the bit-pattern, and that's not *why* it's well-defined in C++. The language does go out of its way to specify that implementations must define the behaviour of unsigned -> signed integer conversions even when it doesn't fit. If not for that, your answer would just show why it happens to work, not why the language standard requires it. The standard also guarantees that casting a `uint64_t` to `int16_t` is not UB. – Peter Cordes Apr 07 '22 at 00:04
  • @PeterCordes: C was originally designed for machines where most operations on signed and unsigned values' bit patterns were homomorphic. While the Standard accommodates machines where that is not the case by specifying one construct whose behavior must be consistent with the homomorphic one, the relationship to bit patterns is only coincidental if one looks at the Standard in isolation without regard for preceding historical practices. – supercat Apr 14 '22 at 16:08