27

I came across this objective question on the C programming language. The output for the following code is supposed to be 0 2, but I don't understand why.

Please explain the initialization process. Here's the code:

#include <stdio.h>

int main()
{
  union a
  {
    int x;
    char y[2];
  };
  union a z = {512};
  printf("\n%d %d", z.y[0], z.y[1]);
  return 0;
}
hopper
  • 13,060
  • 7
  • 49
  • 53
Sumit Cornelius
  • 259
  • 3
  • 12
  • What is the unexpected output that you get? – Bergi Jun 05 '15 at 14:41
  • @Peter Mortensen your edit reversed the meaning of the question! It should be `The output is supposed to be 0 2. I dont' get why!` – edc65 Jun 05 '15 at 15:17
  • 1
    @edc65 Exactly. I tried to fix that, but my suggested edit http://stackoverflow.com/review/suggested-edits/8336900 was rejected. If you also want to try it, maybe you will have better luck than me... – Fabio says Reinstate Monica Jun 05 '15 at 15:20
  • Note that the output from the printf() statement will be different depending on if the underlying hardware architecture is big Endian or little Endian. The output could be (assuming a 32 bit or 64 bit architecture) (little Endian) '0 2' or (big Endian) '0 0'. – user3629249 Jun 05 '15 at 20:47

5 Answers5

19

I am going to assume that you use a little endian system where sizeof int is 4 bytes (32 bits) and sizeof a char is 1 byte (8 bits), and one in which integers are represented in two's complement form. A union only has the size of its largest member, and all the members point to this exact piece of memory.

Now, you are writing to this memory an integer value of 512.

512 in binary is 1000000000.

or in 32 bit two's complement form:

00000000 00000000 00000010 00000000.

Now convert this to its little endian representation and you'll get:

00000000 00000010 00000000 00000000
|______| |______|
   |         |
  y[0]      y[1]

Now see the above what happens when you access it using indices of a char array.

Thus, y[0] is 00000000 which is 0,

and y[1] is 00000010 which is 2.

Arjun Sreedharan
  • 11,003
  • 2
  • 26
  • 34
  • Thanks Arjun. I just didn't think about the binary representation. Now I get it. :) – Sumit Cornelius Jun 05 '15 at 09:31
  • Will accessing `y` beyond its bounds, but still inside the union (e.g. `y[3]`) be undefined behavior, similarly to normal array accesses? – nanofarad Jun 05 '15 at 10:35
  • @hexafraction AFAIK unless `sizeof(int)` < `3 * sizeof(char)`, `y[3]` is legal since it's just `*(y + 3)` and `y + 3` is a legal address occupied by the union. – Arjun Sreedharan Jun 05 '15 at 11:03
  • @haccks no that's not the question. Probably you looked at the edited question. The question isn't very clear. But I assumed it's _Why is the output for the following code supposed to be `0 2`_ ? and I think OP's comment here vindicates my assumption. – Arjun Sreedharan Jun 05 '15 at 17:48
8

The memory allocated for the union is the size of the largest type in the union, which is intin this case. Let's say the size of int on your system is 2 bytes then

512 will be 0x200.

Represenataion looks like:

0000 0010 0000 0000
|        |         |
------------------- 
Byte 1     Byte 0

So the first byte is 0 and the second one is 2.(On Little endian systems)

char is one byte on all systems.

So the access z.y[0] and z.y[1] is per byte access.

z.y[0] = 0000 0000 = 0
z.y[1] = 0000 0010 = 2

I am just giving you how memory is allocated and the value is stored.You need to consider the below points since the output depends on them.

Points to be noted:

  1. The output is completely system dependent.
  2. The endianess and the sizeof(int) matters, which will vary across the systems.

PS: The memory occupied by both the members is the same in union.

Gopi
  • 19,784
  • 4
  • 24
  • 36
  • 10
    Depends on the byte order. – Werner Henze Jun 05 '15 at 08:21
  • 1
    Try this: `union a z={0x12345678}; printf("\n%x %x",z.y[0],z.y[1]);` the output will be more demonstrative. – Jabberwocky Jun 05 '15 at 08:22
  • @WernerHenze: indeed, it only prints `0 2` on little endian systems. – dummydev Jun 05 '15 at 08:22
  • It's worth mentioning, this assignment works because both the members of the union occupy the same location in memory – Sinkingpoint Jun 05 '15 at 08:23
  • @Quirliom that depends on sizeof(int). A 32bit int on big endian system will produce `0 0` as output. – user268396 Jun 05 '15 at 08:24
  • @user268396 I'm not saying anything about the result, only the fact that you can assign the int and access the chars. – Sinkingpoint Jun 05 '15 at 08:25
  • Downvoting, for now: I doubt that the original poster will understand what happens from your answer. As it is, it just throws a couple of important buzzwords at the reader, without explaining them, multiple times (You wrote *three times* that byte order counts but didn't link to a page that explains what it is once..), and in a completely unstructured order, too. – Phillip Jun 05 '15 at 08:36
  • @Phillip ok now i added the link to endianess explanation – Gopi Jun 05 '15 at 08:39
  • Now I get it. Thank you everyone. – Sumit Cornelius Jun 05 '15 at 09:29
  • @SumitCornelius no probs – Gopi Jun 05 '15 at 09:34
  • @KlasLindbäck: "Undefined behavior" and "implementation dependent" are [two very different terms defined by the C-standard](http://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior) . Without checking, I believe this is implementation dependent. – BlueRaja - Danny Pflughoeft Jun 05 '15 at 18:09
8

The standard says that

6.2.5 Types:

A union type describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type.

The compiler allocates only enough space for the largest of the members, which overlay each other within this space. In your case, memory is allocated for int data type (assuming 4-bytes). The line

union a z = {512};

will initialize the first member of union z, i.e. x becomes 512. In binary it is represented as 0000 0000 0000 0000 0000 0010 0000 0000 on a 32 machine.

Memory representation for this would depend on the machine architecture. On a 32-bit machine it either will be like (store the least significant byte in the smallest address-- Little Endian)

Address     Value
0x1000      0000 0000
0x1001      0000 0010
0x1002      0000 0000 
0x1003      0000 0000

or like (store the most significant byte in the smallest address -- Big Endian)

Address     Value
0x1000      0000 0000
0x1001      0000 0000
0x1002      0000 0010 
0x1003      0000 0000

z.y[0] will access the content at addrees 0x1000 and z.y[1] will access the content at address 0x1001 and those content will depend on the above representation.
It seems that your machine supports Little Endian representation and therefore z.y[0] = 0 and z.y[1] = 2 and output would be 0 2.

But, you should note that footnote 95 of section 6.5.2.3 states that

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

Community
  • 1
  • 1
haccks
  • 104,019
  • 25
  • 176
  • 264
1

The size of the union is derived by the maximum size to hold a single element of it. So, here it is the size of int.

Assuming it to be 4 bytes/int and 1 bytes/char, we can say: sizeof union a = 4 bytes.

Now, let's see how it is actually stored in memory:

For example, an instance of the union, a, is stored at 2000-2003:

  • 2000 -> last(4th / least significant / rightmost) byte of int x, y[0]

  • 2001 -> 3rd byte of int x, y[1]

  • 2002 -> 2nd byte of int x

  • 2003 -> 1st byte of int x (most significant)

Now, when you say z=512:

since z = 0x00000200,

  • M[2000] = 0x00

  • M[2001] = 0x02

  • M[2002] = 0x00

  • M[2003] = 0x00

So, whey you print, y[0] and y[1], it will print data M[2000] and M[2001] which is 0 and 2 in decimal respectively.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
shreyans800755
  • 244
  • 1
  • 10
0

For automatic (non-static) members, the initialization is identical to assignment:

union a z;
z.x = 512;
i486
  • 6,491
  • 4
  • 24
  • 41