Given a variable whose type is `uint16_t`, is there any difference between (int16_t) uVal and (int16_t)&uVal?

Question

It seems that they are equivalent. But I can't figure why. Here is related code snippet:

#include<iostream>

void foo(uint16_t uVal)
{
    int16_t auxVal1 = (int16_t) uVal;
    int16_t auxVal2 = *(int16_t*)&uVal;

    std::cout << auxVal1<< std::endl;
    std::cout << auxVal2<< std::endl;

    std::cout << (uint16_t)auxVal1 << std::endl;
    std::cout << *(uint16_t*)&auxVal2 << std::endl;
}

int main()
{
    foo(0xFFFF);
    std::cout << std::endl;

    foo(1);

}

Here is the output:

Now, try this trick with `uVal` being a `char` (or, better yet, a `double`), and see what happens. — Sam Varshavchik, May 15 '21 at 14:31
`(int16_t)uVal` converts the value; `*(int16_t*)&uVal` converts the pointer (the value of the address). The end result may not be discernible with your examples... but try `double x = 4.2; int k = (int)x; int j = *(int*)&x;` — pmg, May 15 '21 at 14:32
@orlp Even convert a `uint16_t` to `int16_t` is undefined?It's amazing. — John, May 15 '21 at 14:32
Conversion of a `uint16_t` to `int16_t` is well-defined if the value of the former is properly representable by the latter. In the case of `0xFFFF`, that condition is *not* met. — Adrian Mole, May 15 '21 at 14:33
@orlp I am amazing that it's undefined. I agree with dbush [https://stackoverflow.com/questions/4975340/int-to-unsigned-int-conversion]( "As has been noted in the other answers, the standard actually guarantees that "the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type)". So even if your platform did not store signed ints as two's complement, the behavior would be the same.") — John, May 15 '21 at 14:35

score 2 · Accepted Answer · edited May 15 '21 at 16:32

2

(int16_t)uval converts the uint16_t value to a int16_t value. For 1 this works as expected, for 0xffff it is implementation-defined behavior because 0xffff does not fit in the bounds of int16_t. (https://en.cppreference.com/w/cpp/language/implicit_conversion#Integral_conversions). Since C++20, it is defined such that it produces the expected value (see below).

*(int16_t*)&uval first casts the uint16_t* pointer to a int16_t* pointer, and then dereferences it. With the C-style pointer cast, the expression is equivalent to *reinterpret_cast<int16_t*>(&uval). (https://en.cppreference.com/w/cpp/language/explicit_cast). static_cast is not possible because uint16_t and int16_t are different types. Because they are also not "similar types", dereferencing the resulting int16_t* pointer is also undefined behavior (https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing).

Because it is undefined behavior the expressions could theoretically result in anything, but with a normal compiler (without optimizations), with the first expression it would attempt to convert the uint16_t into a int16_t, whereas with the second expression, it would attempt to access the raw uint16_t value as if it were a int16_t, without modifying it.

This results in the same value, because of the way signed integer values are stored in two's-complement: Positive values have the same bitwise expression in signed and unsigned types. 0x0001 means 1 for both. But 0xffff (all-one bytes) means 65535 for an uint16_t, and -1 for an int16_t.

So (int16_t)uval (and also (uint16_t)sval) does not need to modify the bitwise value at all, because all values that are in the range of both int16_t and uint16_t are expressed the same way for both. And for values outside the range, it is undefined behavior, so the compiler simply doesn't modify the bitwise value in that case either.

The only way to get the effect of the second expression (accessing raw bitwise data as if it were another type) without undefined behavior would be to use memcpy: int16_t sval; std::memcpy(&sval, &uval, 2);.

edited May 15 '21 at 16:32

1201ProgramAlarm

32,384
7
42
56

answered May 15 '21 at 14:58

tmlen

8,533
5
31
84

@tmlean It's amazing that `uint` to `int` is undefined. It's seems a defined behavior: [stackoverflow.com/questions/4975340/…( "As has been noted in the other answers, the standard actually guarantees that "the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type)". So even if your platform did not store signed ints as two's complement, the behavior would be the same.") – John May 15 '21 at 15:08
@tmlean "For 0xffff it is **implementation-defined behavior**"? Could you please explain that in more detail for me? – John May 15 '21 at 15:24
The compiler defines what `(int16_t)uval` means if the value `uval` is not in the range of `int16_t`. Because (almost) all systems use two's-complement, compilers simply implement it so that the raw bitwise value is not modified at all (same as the second expression). – tmlen May 15 '21 at 15:30
Since C++20 the standard requires that two's-complement is used https://en.cppreference.com/w/cpp/language/types#Range_of_values and the expression is well-defined: https://en.cppreference.com/w/cpp/language/implicit_conversion#Integral_conversions – tmlen May 15 '21 at 15:35
So it returns 65535 + (-1)*2^16 = -1, which is in the range of `int16_t`, also which has the same bitwise representation (all-one bits) in two-complement. – tmlen May 15 '21 at 15:36
@John: The question you link is about converting int to uint. That is defined as you say. It's the other way around that's implementation-defined in C and in C++ < 20. – Nate Eldredge May 15 '21 at 17:18
@NateEldredge I see, thank you. So, given the definition `uint16_t uVal;`, `(int16_t)uVal == *(int16_t*)uVal` always returns `true` for C++>=20 whereas it could not be guaranteed for C++ < 20. Am I right? – John May 16 '21 at 02:42
@John: No, I said nothing about type-punning, only about conversion. That is, `(uint16_t)-1` is guaranteed to be `65535` on all C++ versions, whereas `(int16_t)65535` is implementation-defined on C++ < 20 and guaranteed to be `-1` on C++20. But `*(int16_t*)uVal` is UB on all of the above. – Nate Eldredge May 16 '21 at 03:07
@Nate Eldredge Sorry, this is a clerical error. What I intend to say is `*(int16_t*)&uVal` other than `*(int16_t*)uVal`. So, given the definition `uint16_t uVal;`, `(int16_t)uVal == *(int16_t*)&uVal` always returns `true` for C++>=20 whereas it could not be guaranteed for C++ < 20. Am I right? – John May 16 '21 at 03:37
Nope, I was also thinking of what you meant. `*(int16_t *)&uVal` is also undefined behavior. You're simply not allowed to cast a pointer and then dereference it. Think of it this way: suppose the compiler needs to generate special instructions to properly convert `uint16_t` to `int16_t`. When you cast the value, as `(int16_t)uVal`, it can see both types at once and generate the right code. When you cast the pointer, it may be dereferenced much later, at a place where the compiler has no idea what the original type was, so it has no hope of inserting the proper code. – Nate Eldredge May 16 '21 at 03:45
True, in this case, the dereference is immediate and the compiler could in principle recognize it and do the right conversion. But the standard authors don't want to get in the business of "this works in simple cases but not in complex ones"; they want to specify something consistent. – Nate Eldredge May 16 '21 at 03:46
To be absolutely clear, the case where C++ < 20 and C++ >= 20 differ is just `(int16_t)uVal`, which involves no pointers at all. – Nate Eldredge May 16 '21 at 03:49
In the C++ standard any variable (even if it is just a raw `uint16_t`) is considered an *object* with a given type that gets created at one point and destroyed at another point. This is described here: https://en.cppreference.com/w/cpp/language/object – tmlen May 18 '21 at 10:04
Accessing an object as if it had a different type (by casting a pointer to it, to a different pointer type), is called type-aliasing / type-punning, and is undefined behavior. This is described here: https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing – tmlen May 18 '21 at 10:07
The only way to do this (for example getting a `uint32_t` that contains the same bytes as a given `float`), is to use `std::memcpy` (and in C++20, `std::bit_cast`). Same for accessing the `int16_t` as a `uint16_t` or reverse. After compilation/optimization it would still be a non-operation, the program simply uses the same bytes in memory. – tmlen May 18 '21 at 10:10

Given a variable whose type is `uint16_t`, is there any difference between (int16_t) uVal and *(int16_t*)&uVal?

1 Answers1

Given a variable whose type is `uint16_t`, is there any difference between (int16_t) uVal and (int16_t)&uVal?