-1

In C, signed integer and unsigned integer are stored differently in memory. C also convert signed integer and unsigned integer implicitly when the types are clear at runtime. However, when I try the following snippet,

#include <stdio.h>

int main() {    
    unsigned int a = 5;
    signed int b = a;
    signed int c = *(unsigned int*)&a;
    signed int d = *(signed int*)&a;

    printf("%u\n", a);
    printf("%i\n", b);
    printf("%i\n", c);
    printf("%i\n", d);

    return 0;
}

with the expected output of:

5
5                   //Implicit conversion occurs
5                   //Implicit conversion occurs, because it knows that *(unsigned int*)&a is an unsigned int
[some crazy number] //a is casted directly to signed int without conversion

However, in reality, it outputs

5
5
5
5

Why?

doge99
  • 188
  • 8
  • 10
    It's because, at least in your case, "signed integer and unsigned integer are" ***NOT*** "stored differently in memory". – Sam Varshavchik Dec 01 '18 at 04:03
  • 2
    "In C, signed integer and unsigned integer are stored differently in memory." I don't believe that – Skriptkiddie Dec 01 '18 at 04:06
  • 1
    Also see [What is the strict aliasing rule](https://stackoverflow.com/a/51228315/1708801) – Shafik Yaghmour Dec 01 '18 at 04:15
  • @Skriptkiddie [§6.2.6.2](https://port70.net/~nsz/c/c11/n1570.html#6.2.6.2) – Swordfish Dec 01 '18 at 04:20
  • In this case signed int and unsigned int are most certainly not stored differently. A signed int is just an unsigned value with 2's complement notation. ie. the numbers 0 to 2^31-1 will be stored exactly the same in signed and unsigned notation. The difference is that instead of continuing from there as in the unsigned case, a signed int will use the values 2^31 to 2^32-1 to represent the negative range of -2^31 to -1. In your case, the number 5 would be stored as 0x00000005 in both the signed and unsigned data types. – MikeFromCanmore Dec 01 '18 at 04:24
  • 4
    As long as the number is between 0 and INT_MAX, the value bits of an `unsigned int` and a `signed int` are exactly the same. – user3386109 Dec 01 '18 at 04:26
  • 5
    This is NOT a violation of the *Strict Aliasing Rule*, see [C11 Standard - 6.5 Expressions(p7) bullet 3](https://port70.net/~nsz/c/c11/n1570.html#6.5p7) – David C. Rankin Dec 01 '18 at 04:33
  • Typically so, @MikeFromCanmore, but in fact two's complement representation is only one of three styles of signed-integer representation specifically allowed by the C standard. You're unlikely to see a different one these days, but historically there has indeed been a variety of representations in real-world use. (The others allowed by the standard are ones complement and sign/magnitude.) – John Bollinger Dec 01 '18 at 04:52
  • Possible duplicate of [When is casting between pointer types not undefined behavior in C?](https://stackoverflow.com/q/4810417/608639), [Undefined behavior with type casting?](https://stackoverflow.com/q/37631837/608639), etc. For the C++ tag, also see questions like [Why is casting from char to std::byte potentially undefined behavior?](https://stackoverflow.com/q/52554069/608639) – jww Dec 01 '18 at 05:36
  • Why do you have the c++ tag? – JVApen Dec 01 '18 at 06:54

2 Answers2

7

Your claim that ...

In C, signed integer and unsigned integer are stored differently in memory

... is largely wrong. The standard instead specifies:

For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M <= N ). If the sign bit is zero, it shall not affect the resulting value.

(C2011 6.2.6.2/2; emphasis added)

Thus, although the representation of a signed integer type and its corresponding unsigned integer type (which have the same size) must differ at least in that former has a sign bit and the latter does not, most bits of the representations in fact correspond exactly. The standard requires it. Small(ish), non-negative integers will be represented identically in corresponding signed and unsigned integer types.

Additionally, some of the comments raised the matter of the "strict aliasing rule", which is paragraph 6.5/7 of the standard. It forbids accessing an object of one type via an lvalue of a different type, as your code does, but it allows some notable exceptions. One of the exceptions is that you may access an object via an lvalue whose type is

  • a type that is the signed or unsigned type corresponding to the effective type of the object,

That is in fact what your code does, so there is no strict-aliasing violation there.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • Yeah, the duplicate link to the strict aliasing topic really left me baffled! – Skriptkiddie Dec 01 '18 at 04:50
  • @Skriptkiddie Sowwy! It's easy to forget that exception from the general rule. I refferred you to the link about integer representation because you wrote "I believe". Just wanted to give you a reference to base your (correct) believes on ;) – Swordfish Dec 01 '18 at 05:09
  • 1
    "Small(ish), non-negative integers will be represented identically in corresponding signed and unsigned integer types." I believe that applies to values `[0....INT_MAX]` – chux - Reinstate Monica Dec 01 '18 at 07:42
0

Contrary to the explaination in the comment section, I still want to try to argue that your integers are ALL stored the same way in memory. I am happy to revise my answer, but at this time I still do not believe that unsigned/signed ints are stored differently in memory [actually, I know it ^^].

Test Program:

#include <iostream>

int main() {    
    unsigned int a = 5;
    signed int b = a;
    signed int c = *(unsigned int*)&a;
    signed int d = *(signed int*)&a;

    printf("%u\n", a);
    printf("%i\n", b);
    printf("%i\n", c);
    printf("%i\n", d);
    std::terminate();

    return 0;
}

Compile it using: g++ -O0 -g test.cpp

Run it in GDB: gdb ./a.out

Once std::terminate is called, we can examine the raw memory:

(gdb) print/t main::a
$9 = 101
(gdb) print/t main::b
$10 = 101
(gdb) print/t main::c
$11 = 101
(gdb) print/t main::d
$12 = 101
(gdb) 

The integers are all stored the same way, be it unsigned or signed int. The only diffrence is how they are interpreted, once a unsigned int over SIGNED_INT_MAX gets cast to a signed int. This cast, however, will also not alter the memory at all.

Skriptkiddie
  • 411
  • 2
  • 7