1
main()
{
union{
    char i[2];
    struct{
        short age;
    } myStruct;
} myUnion;
myUnion.i[0] = 'A';
myUnion.i[1] = 'B';
printf("%x ", myUnion.myStruct.age);

} 

So I understand that the union only contains the space for the largest member inside it - in this case, the char array "i" and the struct "myStruct" seem to be the same, so the union would only have two bytes containing characters 'A' and 'B'. However, what would happen if you tried to read the struct member "age" at that point?

Mikey Chen
  • 2,370
  • 1
  • 19
  • 31
  • 1
    Why don't you try it? You'd see a value based on the values of `'A'` (0x61) and `'B'` (0x62), either 0x6162 or 0x6261, depending whether it is a big-endian or little-endian machine. Beware, nominally you're treading into undefined behaviour. In practice, no compiler writer has yet been mad enough to break such code. – Jonathan Leffler Apr 02 '15 at 22:26
  • @JonathanLeffler: It's no longer UB as of C99. – Dietrich Epp Apr 02 '15 at 22:26
  • @DietrichEpp: can you specify chapter and verse for that? Section 6.7.2.1/16 says _The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit- field, then to the unit in which it resides), and vice versa._ The 'value of at most one member' part implies that there is still some restriction. What lets you access the value that wasn't stored last? – Jonathan Leffler Apr 02 '15 at 22:27
  • Oh flip; my first comment uses the hex codes for `'a'` and `'b'` instead of `'A'` and `'B'`, which should be 0x41 and 0x42. Adjust the rest accordingly. – Jonathan Leffler Apr 02 '15 at 22:32
  • @JonathanLeffler: n1570 §6.2.6.1 paragraph 7. – Dietrich Epp Apr 02 '15 at 22:32
  • @JonathanLeffler: Better yet, see this question: http://stackoverflow.com/questions/11639947/is-type-punning-through-a-union-unspecified-in-c99-and-has-it-become-specified – Dietrich Epp Apr 02 '15 at 22:36
  • @DietrichEpp - if I recall, a provision was made only when a union contains structures that begin the same. – teppic Apr 02 '15 at 22:36
  • OK: 6.2.6.1 Representation of types, para 7: _When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values._ Sounds promising...looking at the C90 standard to find the contrary behaviour. – Jonathan Leffler Apr 02 '15 at 22:36
  • @teppic: Under those circumstances, the C standard requires the common leading members to take the same values. However, that is a different issue—the C standard also permits type punning, but does not state what result you will get. See the linked question. – Dietrich Epp Apr 02 '15 at 22:38
  • @DietrichEpp I was just looking it up having said that, I noticed the footnote 'If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation' – teppic Apr 02 '15 at 22:43
  • @teppic: That's a good point, but it's a serious challenge to find a system with a trap representation for integers. I believe that such a system existed, but if I recall, the trap representations were for signed `long` which were implemented by gluing two signed `short` together, and the system wasn't twos-complement. This means that union type punning is never UB on systems without trap reperesentations, which is just about every system these days. – Dietrich Epp Apr 02 '15 at 22:49

1 Answers1

4

It used to be, in days past, that this was "undefined behavior" and could theoretically crash your system or worse. However, programmers did it anyway, and it was codified in C99 (see Is type-punning through a union unspecified in C99, and has it become specified in C11?), which allows you to do it but doesn't say what the results will be or whether they make sense at all.

So,

  • On modern 8-bit-byte 16-bit-short little-endian systems it will print 4241,

  • On modern 8-bit-byte 16-bit-short big-endian systems it will print 4142,

  • If sizeof(short) > 2 then you have a problem, because age is uninitialized (but these systems are very rare),

  • You will get different results on EBCDIC (which you don't use or care about),

  • You will get different results on non-8-bit-byte systems (which you don't use or care about),

  • You could invoke undefined behavior if your program creates a trap representation for a short... however, modern systems do not have trap representations for integers.

Community
  • 1
  • 1
Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415