0

My code:

#include<stdio.h>

union U{
    int x;
    char y;
};

int main()
{
    union U u1;
    u1.x = 258;
    u1.y = '0';
    printf("%d%d",u1.x,u1.y);
    return 0;
}

Strangely, the output is 30448.

Can someone please explain how this happens?

anastaciu
  • 23,467
  • 7
  • 28
  • 53
  • 3
    What were you expecting instead? – Barmar Oct 19 '20 at 15:17
  • 2
    It's only portable to read the last member of a union that was assigned. – Barmar Oct 19 '20 at 15:17
  • 1
    Note too that `'0'` is not the same thing as `0`, which may be a factor in your confusion. – John Bollinger Oct 19 '20 at 15:18
  • 2
    It is undefined behaviour, but *something* happened. The `258` is hex `102` so after writing that the four bytes contained `02 01 00 00` (little-endian). You then overwrote the first byte with `'0'` which is hex `30`. So the four bytes are now `30 01 00 00` and hex `130` is decimal `304`. But because you did not output a space, the next value hex `30` decimal `48` was run in to it, and the output `304 48` is shown as `30448`. – Weather Vane Oct 19 '20 at 15:27
  • Does this answer your question? [Purpose of Unions in C and C++](https://stackoverflow.com/questions/2310483/purpose-of-unions-in-c-and-c) – ecoplaneteer Oct 20 '20 at 01:44

2 Answers2

3

You maybe missunderstanding the purpose of a union. It is meant to store only one variable at a time, but this variable can have multiple types. The last variable stored will overwrite the previous.

In your case u1.y (which is '0', it's relevant to remind that the 1 byte ASCII decimal representation for '0' is 48), is the last value stored, this corresponds to last 2 digits of your output as you print '0' by its ASCII decimal representation.

As for the first part of the output, note that you overwrite the int variable 258, which is presumably 4 bytes (but for the sake of explanation I will assume it's 2 bytes) with the 1 byte wide char variable 48.

The binary value for 258 (assuming 2 bytes wide int) is:

|0|0|0|0|0|0|0|1|0|0|0|0|0|0|1|0|

|   2nd byte    |   1st byte    |              

The binary value for 48 (1 byte wide char variable) is:

| | | | | | | | |0|0|1|1|0|0|0|0|

                |   1st byte    | 

When you overwrite the two byte union variable with a one byte variable only the 8 least significant bits(least significant byte) will be overwritten, so you'll end up with:

|0|0|0|0|0|0|0|1|x|x|x|x|x|x|x|x|
| | | | | | | | |0|0|1|1|0|0|0|0|

|0|0|0|0|0|0|0|1|0|0|1|1|0|0|0|0|

And this is the binary representation of 304.

So yor code first prints the 2 bytes wide (for the sake of the examle) int 304 and next the 1 byte wide int 48 (the ASCII int representation of '0'), hence the output 30448.

Note that this behavior is not undefined.

ISO/IEC 9899:2017 N2176

§ 6.5.2.3

97) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called “type punning”). This might be a trap representation.

§ 6.2.6.2

6 - When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.51) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

7 - When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

For confirmation you can use:

printf("%p %p\n", (void*)&u1.x, (void*)&u1.y);

This will print the memory address of both u1.x and u1.y and you will not be shocked to find that they are the same.

anastaciu
  • 23,467
  • 7
  • 28
  • 53
  • 1
    For 2 **or more** byte **little-endian** `int`. For people unaware of computers other than consumer desktops and laptops and similar, 4-byte LE is the most likely by far. – dave_thompson_085 Oct 21 '20 at 22:45
  • @dave_thompson_085 yours is an interesting comment and true too. In any case I'm not trying to accurately portray byte ordering, just a comprehensible visualization of the sequence of events that leads to the output, regardless of endianness, which I believe could confuse the OP further. – anastaciu Oct 22 '20 at 11:22
-1

You cannot read a member of a union other than the one last written into

#include<stdio.h>
union U{
int x;
char y;
};

int main()
{
    union U u1;
    u1.x = 258; // write into member x: OK
    u1.y = '0'; // write into member y: OK
    printf("%d%d",u1.x,u1.y); // read both member x and y: WRONG
                              // can only read member y
    return 0;
}
pmg
  • 106,608
  • 13
  • 126
  • 198
  • There is no rule in the C standard against reading a union member other than the last one modified, C 2018 6.5.2.3 says that reading a member produces the value of the named member, and note 99 explains the appropriate part of the object representation is reinterpreted as an object representation in the new type. – Eric Postpischil Oct 19 '20 at 20:20
  • [C11 J.1](http://port70.net/~nsz/c/c11/n1570.html#J.1) -- **portability issues** -- The values of bytes that correspond to union members other than the one last stored into [is unspecified]. And there are a [few](http://port70.net/~nsz/c/c11/n1570.html#note46) [footnotes](http://port70.net/~nsz/c/c11/n1570.html#note95) [throughout](http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p16) the Standard that make the issue kinda murky @EricPostpischil – pmg Oct 19 '20 at 20:44
  • 2
    Yes, it is not portable. But you can do it. And the resulting behavior is specified: The appropriate bytes of the union are reinterpreted as the new type. The statement “You cannot *read* a member of a union other than the last one *written* into” is false. – Eric Postpischil Oct 19 '20 at 20:46
  • @EricPostpischil+ technically writing a text stream (which stdout is) with the last or only (pseudo)line not terminated by `\n` is impl-def and arguably may be undefined :-} – dave_thompson_085 Oct 21 '20 at 22:44