Why casting unsigned to signed directly in C gives correct result?

Question

In C, signed integer and unsigned integer are stored differently in memory. C also convert signed integer and unsigned integer implicitly when the types are clear at runtime. However, when I try the following snippet,

#include <stdio.h>

int main() {    
    unsigned int a = 5;
    signed int b = a;
    signed int c = *(unsigned int*)&a;
    signed int d = *(signed int*)&a;

    printf("%u\n", a);
    printf("%i\n", b);
    printf("%i\n", c);
    printf("%i\n", d);

    return 0;
}

with the expected output of:

5
5                   //Implicit conversion occurs
5                   //Implicit conversion occurs, because it knows that *(unsigned int*)&a is an unsigned int
[some crazy number] //a is casted directly to signed int without conversion

However, in reality, it outputs

Why?

It's because, at least in your case, "signed integer and unsigned integer are" ***NOT*** "stored differently in memory". — Sam Varshavchik, Dec 01 '18 at 04:03
"In C, signed integer and unsigned integer are stored differently in memory." I don't believe that — Skriptkiddie, Dec 01 '18 at 04:06
Also see [What is the strict aliasing rule](https://stackoverflow.com/a/51228315/1708801) — Shafik Yaghmour, Dec 01 '18 at 04:15
@Skriptkiddie [§6.2.6.2](https://port70.net/~nsz/c/c11/n1570.html#6.2.6.2) — Swordfish, Dec 01 '18 at 04:20
In this case signed int and unsigned int are most certainly not stored differently. A signed int is just an unsigned value with 2's complement notation. ie. the numbers 0 to 2^31-1 will be stored exactly the same in signed and unsigned notation. The difference is that instead of continuing from there as in the unsigned case, a signed int will use the values 2^31 to 2^32-1 to represent the negative range of -2^31 to -1. In your case, the number 5 would be stored as 0x00000005 in both the signed and unsigned data types. — MikeFromCanmore, Dec 01 '18 at 04:24
As long as the number is between 0 and INT_MAX, the value bits of an `unsigned int` and a `signed int` are exactly the same. — user3386109, Dec 01 '18 at 04:26
This is NOT a violation of the *Strict Aliasing Rule*, see [C11 Standard - 6.5 Expressions(p7) bullet 3](https://port70.net/~nsz/c/c11/n1570.html#6.5p7) — David C. Rankin, Dec 01 '18 at 04:33
Typically so, @MikeFromCanmore, but in fact two's complement representation is only one of three styles of signed-integer representation specifically allowed by the C standard. You're unlikely to see a different one these days, but historically there has indeed been a variety of representations in real-world use. (The others allowed by the standard are ones complement and sign/magnitude.) — John Bollinger, Dec 01 '18 at 04:52
Possible duplicate of [When is casting between pointer types not undefined behavior in C?](https://stackoverflow.com/q/4810417/608639), [Undefined behavior with type casting?](https://stackoverflow.com/q/37631837/608639), etc. For the C++ tag, also see questions like [Why is casting from char to std::byte potentially undefined behavior?](https://stackoverflow.com/q/52554069/608639) — jww, Dec 01 '18 at 05:36

John Bollinger · Accepted Answer · 2018-12-01T04:55:48.530

Your claim that ...

In C, signed integer and unsigned integer are stored differently in memory

... is largely wrong. The standard instead specifies:

For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M <= N ). If the sign bit is zero, it shall not affect the resulting value.

(C2011 6.2.6.2/2; emphasis added)

Thus, although the representation of a signed integer type and its corresponding unsigned integer type (which have the same size) must differ at least in that former has a sign bit and the latter does not, most bits of the representations in fact correspond exactly. The standard requires it. Small(ish), non-negative integers will be represented identically in corresponding signed and unsigned integer types.

Additionally, some of the comments raised the matter of the "strict aliasing rule", which is paragraph 6.5/7 of the standard. It forbids accessing an object of one type via an lvalue of a different type, as your code does, but it allows some notable exceptions. One of the exceptions is that you may access an object via an lvalue whose type is

a type that is the signed or unsigned type corresponding to the effective type of the object,

That is in fact what your code does, so there is no strict-aliasing violation there.

Yeah, the duplicate link to the strict aliasing topic really left me baffled! — Skriptkiddie, Dec 01 '18 at 04:50
@Skriptkiddie Sowwy! It's easy to forget that exception from the general rule. I refferred you to the link about integer representation because you wrote "I believe". Just wanted to give you a reference to base your (correct) believes on ;) — Swordfish, Dec 01 '18 at 05:09
"Small(ish), non-negative integers will be represented identically in corresponding signed and unsigned integer types." I believe that applies to values `[0....INT_MAX]` — chux - Reinstate Monica, Dec 01 '18 at 07:42

Skriptkiddie · Answer 2 · 2018-12-01T04:51:53.300

Contrary to the explaination in the comment section, I still want to try to argue that your integers are ALL stored the same way in memory. I am happy to revise my answer, but at this time I still do not believe that unsigned/signed ints are stored differently in memory [actually, I know it ^^].

Test Program:

#include <iostream>

int main() {    
    unsigned int a = 5;
    signed int b = a;
    signed int c = *(unsigned int*)&a;
    signed int d = *(signed int*)&a;

    printf("%u\n", a);
    printf("%i\n", b);
    printf("%i\n", c);
    printf("%i\n", d);
    std::terminate();

    return 0;
}

Compile it using: g++ -O0 -g test.cpp

Run it in GDB: gdb ./a.out

Once std::terminate is called, we can examine the raw memory:

(gdb) print/t main::a
$9 = 101
(gdb) print/t main::b
$10 = 101
(gdb) print/t main::c
$11 = 101
(gdb) print/t main::d
$12 = 101
(gdb)

The integers are all stored the same way, be it unsigned or signed int. The only diffrence is how they are interpreted, once a unsigned int over SIGNED_INT_MAX gets cast to a signed int. This cast, however, will also not alter the memory at all.

Why casting unsigned to signed directly in C gives correct result?

2 Answers2