70

A number of answers for the Stack Overflow question Getting the IEEE Single-precision bits for a float suggest using a union structure for type punning (e.g.: turning the bits of a float into a uint32_t):

union {
    float f;
    uint32_t u;
} un;
un.f = your_float;
uint32_t target = un.u;

However, the value of the uint32_t member of the union appears to be unspecified according to the C99 standard (at least draft n1124), where section 6.2.6.1.7 states:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

At least one footnote of the C11 n1570 draft seems to imply that this is no longer the case (see footnote 95 in 6.5.2.3):

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

However, the text to section 6.2.6.1.7 is the same in the C99 draft as in the C11 draft.

Is this behavior actually unspecified under C99? Has it become specified in C11? I realize that most compilers seem to support this, but it would be nice to know if it's specified in the standard, or just a very common extension.

Community
  • 1
  • 1
sfstewman
  • 5,589
  • 1
  • 19
  • 26
  • 9
    Technical note: Accessing a union member other than the last one stored does not cause a program to violate the C standard. Accessing such a union member results in an unspecified value (not undefined behavior), and, per C 1999 4 3, “shall be a correct program and act in accordance with 5.1.2.3.” Further, a compiler may provide additional guarantees about the value and remain a conforming implementation. – Eric Postpischil Jul 24 '12 at 22:21
  • Basically what Wug said. The change is that C99 nowhere explicitly mentions that reading members other than the one last written to is okay, while C11 (at least the draft n1570) does. So by "Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ _or by the omission of any explicit definition of behavior_." it was sometimes stated that that was undefined behaviour. I'm not enough of a language lawyer to cast a definitive verdict on that interpretation. – Daniel Fischer Jul 24 '12 at 22:24
  • This is not a recent addition, but already appears in n1256. And this had been a modification as a result of a defect report: the intention had always been the one that is expressed now. – Jens Gustedt Jul 24 '12 at 22:29
  • @EricPostpischil: Modified "violate" to "be unspecified behavior". Does this address your concern? – sfstewman Jul 24 '12 at 22:30
  • @sfstewman, there is no such thing like "unspecified behavior". There are only unspecified values. Here the unspecified values are those bytes that extend the type that you are writing to, if any. – Jens Gustedt Jul 24 '12 at 22:34
  • 1
    @Daniel Fischer: C 1999 does say that reading a member other than the last one written is “okay.” It says this results in an unspecified value (6.2.6.1), and, per my note above, this is a correct program (as long as it is otherwise correct). There is no undefined behavior here. (Unspecified values are not undefined behavior: Undefined behavior is not limited by the standard. For unspecified values; the standard is imposing limits: the behavior must be as if the expression has some value.) – Eric Postpischil Jul 24 '12 at 22:38
  • 2
    @DanielFischer: Both the n1124 and n1570 drafts explicitly list as unspecified: "The value of a union member other than the last one stored into (6.2.6.1)" in Appendix J (portability issues). To me, this seems to imply that there could exist a C99 (or C11) compiler where using a union for type punning does not do what we would expect. – sfstewman Jul 24 '12 at 22:39
  • @JensGustedt: “Unspecified behavior” is defined in 3.4.4, to mean use of an unspecified value or other behavior which may have more than one possibility. – Eric Postpischil Jul 24 '12 at 22:40
  • 4
    Read it again, it says that those **bytes** that correspond to another member and not to the one that was written to have unspecified value. This implies that the bytes that correspond to that member (so those that are common to both) have a specific value, namely the one that was written. This para is only there to explain what happens (or not) to the bytes that are not written, that's all. – Jens Gustedt Jul 24 '12 at 22:45
  • 4
    @sfstewman, appendix J is not normative. – Jens Gustedt Jul 24 '12 at 22:46
  • @EricPostpischil I don't see any explicit mention of reading a member there. It says the bytes not corresponding to the member last written to but to other members have unspecified values. That's an indication that reading other members is okay, and just may result in an unspecified value, but it's not explicitly stated that doing that is allowed. In n1570, it is explicitly stated (but, footnotes are not normative, so one could argue). – Daniel Fischer Jul 24 '12 at 22:52
  • @sfstewman As Jens said, as long as the member you read does not use bytes outside the object representation of the member last stored, the footnote explicitly says you get the bytes from the member last stored. – Daniel Fischer Jul 24 '12 at 22:55
  • 1
    @DanielFischer: First, as Jens Gustedt points out, reading a member other than the one written is not unspecified behavior if it has the same size. Per 6.2.6.1 2, either the standard or the implementation must specify the number, order, and encoding of the bytes of an object. So, in any single instance, when you write one member and read another member of the same size, only one value is possible. Second, we learn from 6.2.6.1 that reading a member of a larger size results in an unspecified value, from bytes that are not part of the member originally written. Then, 4 1 tells us this is... – Eric Postpischil Jul 24 '12 at 23:10
  • still a correct program, if nothing else about it renders it incorrect. – Eric Postpischil Jul 24 '12 at 23:11
  • 1
    @EricPostpischil I don't disagree. But since the old standard never explicitly said what happened when reading from a member other than the last stored, some people said it was undefined behaviour by omission of defining the behaviour. I've read that often. – Daniel Fischer Jul 24 '12 at 23:23
  • @DanielFischer Indeed, anything not explicitly defined is **un** defined, by definition. The issue is delicate, as the standard is not a formal document. – curiousguy Nov 11 '13 at 23:32
  • 2
    @EricPostpischil: If between the write of the first value and read of the second, code were to examine the bytes occupied by the field of the union, the standard would indicate what those bytes must contain. I don't know that anything in the old standard would prevent the compiler from e.g. optimizing a `float` within a union to an FPU register and its overlayed `int` to a CPU register, and reading/writing those registers to/from memory only when forced to by `char*` aliasing rules. – supercat Jul 09 '14 at 18:34
  • See also: [Portability of using union for conversion](https://stackoverflow.com/q/67206482/4561887). – Gabriel Staples May 02 '21 at 21:13

4 Answers4

46

The behavior of type punning with union changed from C89 to C99. The behavior in C99 is the same as C11.

As Wug noted in his answer, type punning is allowed in C99 / C11. An unspecified value that could be a trap is read when the union members are of different size.

The footnote was added in C99 after Clive D.W. Feather Defect Report #257:

Finally, one of the changes from C90 to C99 was to remove any restriction on accessing one member of a union when the last store was to a different one. The rationale was that the behaviour would then depend on the representations of the values. Since this point is often misunderstood, it might well be worth making it clear in the Standard.

[...]

To address the issue about "type punning", attach a new footnote 78a to the words "named member" in 6.5.2.3#3: 78a If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

The wording of Clive D.W. Feather was accepted for a Technical Corrigendum in the answer by the C Committee for Defect Report #283.

Community
  • 1
  • 1
ouah
  • 142,963
  • 15
  • 272
  • 331
  • 4
    The DR is unclear, and a footnote is not normative and can only explain what is defined elsewhere. Also, the DR really does not clarify anything. The issue is confused because the WG is confused. (Also, Wug is wrong on the meaning of "type punning".) – curiousguy Nov 11 '13 at 23:34
  • The quoted text doesn't seem to back up your conclusion. *This might be a trap representation.* it says. – andrewrk Aug 05 '16 at 19:58
  • 3
    @andrewrk My conclusion is that type punning is allowed in C99 and C11. The fact that you can read a trap representation for the other member after writing to the member does not change this conclusion. It means on some systems with some specific values you can invoke undefined behavior. Analogously if you use the `*` binary operator with some specific operand values you are also prone to undefined behavior (signed integer overflow) which does not mean the operator is UB *per se* or is not allowed to be used. – ouah Aug 05 '16 at 23:51
  • @curiousguy My opinion is that their choice to go through the footnote was not the best way to clear up the confusion. They should also have modified 6.2.6.1p7 (in C99) to make things more clear and normative. – ouah Aug 05 '16 at 23:52
  • You can get a trap representation even if they're the *same* size. Not sure why you single it out for the different-size case. – Peter Cordes Oct 03 '16 at 17:30
21

The original C99 specification left this unspecified.

One of the technical corrigenda to C99 (TR2, I think) added footnote 82 to correct this oversight:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

That footnote is retained in the C11 standard (it's footnote 95 in C11).

Stephen Canon
  • 103,815
  • 19
  • 183
  • 269
  • I think we have hashed out above, in comments to the question, that this is specified by the 1999 C standard, for members of the same size, at least to the point that a conforming implementation is required to define information sufficient to deduce the value. – Eric Postpischil Jul 24 '12 at 23:20
  • @EricPostpischil "_conforming implementation is required to define information sufficient to deduce the value_" where is that required? – curiousguy Nov 11 '13 at 23:35
  • 1
    @curiousguy: As stated in the comments to this question, C 1999 6.2.6.1 2 (and the same paragraph in C 2011) states the number, order, and encoding of bytes that form objects are either explicitly specified (by the standard) or defined by the implementation. – Eric Postpischil Nov 11 '13 at 23:56
  • @EricPostpischil Thank you. (And I find that fact a bit surprising.) – curiousguy Nov 12 '13 at 00:43
  • 2
    Unfortunately, whoever wrote that failed to consider that the lack of a defined value when writing one union member and reading another was necessary to justify compiler behavior when a function is passed pointers to different members of a union object. If it's ever legal take the address of union members and use the resulting pointers without first casting to "char" or using memcpy, nothing in the Standard would justify the fact that writing one union member via pointer and reading another often has different behavior than writing and reading the union members directly. – supercat May 12 '16 at 22:10
12

This has always been "iffy". As others have noted a footnote was added to C99 via a Technical Corregendum. It reads as follows:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

However, footnotes are specified in the Foreword as non-normative:

Annexes D and F form a normative part of this standard; annexes A, B, C, E, G, H, I, J, the bibliography, and the index are for information only. In accordance with Part 3 of the ISO/IEC Directives, this foreword, the introduction, notes, footnotes, and examples are also for information only.

That is, the footnotes cannot proscribe behaviour; they should only clarify the existing text. It's an unpopular opinion, but the footnote quoted above actually fails in this regard - there is no such behaviour proscribed in the normative text. Indeed, there are contradictory sections, such as 6.7.2.1:

... The value of at most one of the members can be stored in a union object at any time

In conjunction with 6.5.2.3 (regarding accessing union members with the "." operator):

The value is that of the named member

I.e. if the value of only one member can be stored, the value of another member is non-existent; since "the value is that of the named member", naming a member whose value isn't currently stored must yield a non-existent value. This strongly implies that type punning via a union should not be possible. The same text still exists in the C11 document.

It's clear that the purpose of adding the footnote was to explicitly allow for type-punning; it's just that the committee seemingly broke the rules on footnotes not containing normative text, and introduced a contradiction when they did so. To accept the footnote, you really have to disregard the section that says footnotes aren't normative, or otherwise try to figure out how to interpret the normative text in such a way that supports the conclusion of the footnote (which I have tried, and failed, to do), and then you have to reconcile that with the "non-existent value" problem I outlined above.

About the best we can do to ratify the footnote is to make some assumptions about the definition of a union as a set of "overlapping objects", from 6.2.5:

A union type describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type

Unfortunately there is no elaboration on what is meant by "overlapping". An object is defined as a (3.14) "region of data storage in the execution environment, the contents of which can represent values" (that the same region of storage can be identified by two or more distinct objects is implied by the "overlapping objects" definition above, that is, objects have an identity which is separate to their storage region). The reasonable assumption seems to be that union members (of a particular union instance) use the same storage region.

Even if we ignore 6.7.2.1/6.5.2.3 and allow, as the footnote suggests, that reading any union member returns the value that would be represented by the contents of the corresponding storage region—which would therefore allow for type punning—the ever-problematic strict-aliasing rule in 6.5 disallows (with certain minor exceptions) accessing an object other than by its type. Since an "access" is an (3.1) "〈execution-time action〉 to read or modify the value of an object", and since modifying one of a set of overlapping objects necessarily modifies the others, then the strict-aliasing rule could potentially be violated by writing to a union member (regardless of whether it is then read through another, or not).

For example, by the wording of the standard, together with the notion that each member exists as a distinct object all overlapping the same storage, it seems like the following is illegal:

union {
   int a;
   float b;
} u;

u.b = 0.5;  // store a float value in the union object subobject
u.a = 0; // (#1) modifies a float object by an lvalue of type int
int *pa = &u.a;
*pa = 1; // (#2) also modifies a float object, without union lvalue involved

(Specifically, the lines marked as #1 and #2 would break the strict-aliasing rule. In both cases this could perhaps be avoided if storing to a member erases the value of any previously active member, as is suggested by 6.7.2.1, though as pointed out previously this by-and-large prohibits type punning via a union).

Strictly speaking, the footnote speaks to a separate issue, that of reading an inactive union member; however the strict-aliasing rule in conjunction with other sections as noted above seriously limits its applicability and in particular means that it does not allow type-punning in general (but only for specific combinations of types).

Frustratingly, the committee responsible for developing the standard seem to intend for type-punning to generally be possible via a union, and yet do not appear to be troubled that the normative text of the standard still makes no requirement for it.

Worth noting also is that the consensus understanding (by compiler vendors) seems to be that type punning via a union is allowed, but "access must be via the union type" (eg the first commented line in the example above, but not the second). It's a little unclear whether this should apply to both read and write accesses, and is in no way supported by the text of the standard (disregarding the footnote).

In conclusion: while it is largely accepted that type punning via a union is legal (most consider it allowed only if the access is done "via the union type", so to speak), the normative wording of the standard prohibits it in all but certain trivial cases, and in practice there are limitations beyond what the (non-normative) footnote that does appear to allow type punning would imply.

The section you quote:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

... has to be read carefully, though. "The bytes of the object representation that do not correspond to that member" is referring to bytes beyond the size of the member, which isn't itself an issue for type punning (except that you cannot assume writing to a union member will leave the "extra" part of any larger member untouched).

davmac
  • 20,150
  • 1
  • 40
  • 68
  • Hmm... [§ 6.2.5/20](http://port70.net/~nsz/c/c11/n1570.html#6.2.5p20) states that union members overlap. Combining this with "The value of at most one of the members can be stored in a union object at any time" allows the latter to be interpreted as stating that this overlapping storage space can contain the value of exactly one member at a time, which by extension means that all inactive members would at that time be alternate views of some or all of the active member (due to addressing the same overlapping storage space). – Justin Time - Reinstate Monica Apr 14 '17 at 22:41
  • For example, given `typedef union { int i; char c; } U;` and `U u;`, if `u.i` is assigned the value `5`, then `u.c` would be one of the bytes of integer literal `5` due to sharing the same storage space. This mandate that union members overlap is thus the lynchpin that allows an interpretation which supports type punning. – Justin Time - Reinstate Monica Apr 14 '17 at 22:44
  • That feels like it shouldn't be a valid interpretation, but it does technically match the letter of the standard. – Justin Time - Reinstate Monica Apr 14 '17 at 22:49
  • @JustinTime "which by extension means that all inactive members would at that time be alternate views of some or all of the active member (due to addressing the same overlapping storage space)" - that is, IMO, an extrapolation but not a definite logical necessity; the problem is that it is not really specified what the "overlap" entails except via the non-normative footnote, and the concept of objects as merely a view on storage is somewhat inconsistent with eg. the strict aliasing rule. – davmac Apr 18 '17 at 09:52
  • @JustinTime in any case the definition of member access in 6.5.2.3 is problematic. If the value is "that of the named member" and the named member does not have a stored value, then there is clearly a problem. It doesn't say that "the value is that determined by the representation stored in the object corresponding to the member", which it needs to in order to allow your interpretation, I think. Although, as I say in the answer, this is presumably what is actually intended. – davmac Apr 18 '17 at 10:19
  • That's a very good point. I just assumed that by saying they overlap, it meant that they all use a shared memory space large enough for the largest member of the union, to try and find the rules lawyering used to interpret it to support type punning. It definitely should be written clearer if it's meant to allow type punning, I agree. – Justin Time - Reinstate Monica Apr 18 '17 at 17:36
  • 1
    @JustinTime: Under C89, given `union {T1 v1; T2 v2;} u;`, the behaviors both `u.v1 = thing1; thing2 = u.v2;` and `T1 *p1=&u.v1; T2 *p2=&u.v2; *p1=thing1; thing2=*p2;` would be defined by the Standard in the same cases (e.g. those involving compatible types or the Common Initial Sequence rule), and would be Implementation-Defined in all others. Implementations may have latitude to treat such accesses differently based upon the form of lvalue used, but nothing in the C89 makes such distinctions. C89 could not allow type punning in the first case without doing likewise in the second. – supercat Sep 30 '17 at 17:02
0

However, this appears to violate the C99 standard (at least draft n1124), where section 6.2.6.1.7 states some stuff. Is this behavior actually unspecified under C99?

No, you're fine.

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

This applies to data blocks of different sizes. I.e, if you have:

union u
{
    float f;
    double d;
};

and you assign something to f, it would change the lower 4 bytes of d, but the upper 4 bytes would be in an indeterminate state.

Unions exist primarily for type punning.

Wug
  • 12,956
  • 4
  • 34
  • 54
  • 31
    Unions do not exist for the sole purpose of type punning. Unions exist because sometimes you want to store one type of object and later retrieve it, and sometimes you want to store a different type of object and later retrieve it. – Eric Postpischil Jul 24 '12 at 22:09
  • 1
    That's type punning. From wikipedia: *type punning is a common term for any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal language* – Wug Jul 24 '12 at 22:11
  • 10
    *Unions exist for the sole purpose of type punning* I think unions were added to the language to save space rather. – ouah Jul 24 '12 at 22:15
  • 25
    No, type punning is writing one member and reading another. Writing one member and reading the same member is not punning. Nor is writing one, reading it, writing a second member, and reading the second member. When you read the same member as was last written, you have not changed types, so you have not circumvented the type system. – Eric Postpischil Jul 24 '12 at 22:22
  • I'd argue that using a single container structure to store arbitrary values of different types in the same physical location qualifies regardless of whether or not the data is interpreted over different types in one situation or not. – Wug Jul 24 '12 at 22:30
  • 13
    "Type punning" is usually understood to mean writing as one type, and reading the same bits back as another. But `union` is commonly used inside a `struct`, alongside an `enum` which indicates which type the union currently holds. e.g. an interpreter might have a `struct value` which can contain an integer *or* a floating point value, which would have either `.type = T_INT` and `.u.int_val = 123`, or `.type = T_FLOAT` and `.u.float_val = 4.56`. In this case you only ever expect to read the same type from `.u` that was originally written, and I would *not* consider that to be "type punning". – Matthew Slattery Jul 24 '12 at 23:29
  • 7
    "_I'd argue that using a single container structure to store arbitrary values of different types in the same physical location qualifies_" Sorry, but you do **not** get to define what qualifies as "type punning". It is an old and precisely defined concept (reinterpreting the bytes of an object as another type). **Reusing unused storage** to write an object of another type definitely is **not** type punning! – curiousguy Nov 11 '13 at 23:22
  • "the upper 4 bytes would be in an indeterminate state." - actually they would be unspecified values, as shown by your earlier quote. – M.M Nov 26 '15 at 01:42
  • @MatthewSlattery this gets more complicated with the `sockaddr_` family of structures where the field that defines the type (`s*_family`) is _part of the structure_. – Alnitak Sep 05 '18 at 10:04