11

Consider following code:

union U
{
    int a;
    float b;
};

int main()
{
    U u;
    int *p = &u.a;
    *(float *)p = 1.0f; // <-- this line
}

We all know that addresses of union fields are usually same, but I'm not sure is it well-defined behavior to do something like this.

So, question is: Is it legal and well-defined behavior to cast and dereference a pointer to union field like in the code above?


P.S. I know that it's more C than C++, but I'm trying to understand if it's legal in C++, not C.

HolyBlackCat
  • 78,603
  • 9
  • 131
  • 207
  • 1
    Why would you? But I'm pretty sure it would be legal, since *all* union members start at the same address. (Otherwise, it wouldn't be a union anymore.) – owacoder Oct 10 '15 at 16:47
  • 2
    it is legal but not recommended – vishal Oct 10 '15 at 16:47
  • 1
    As others have said, legal or otherwise, it is bad design! Semantically, A union should contain exactly one of its members. What you're trying to do sounds "clever", and you have to be twice as clever to fix a bug as you were when you created it. Don't be clever if there is another way. – Daniel Oct 10 '15 at 16:50
  • 3
    How is it more C than C++? Unions exist in both languages and so do pointers. – Thomas Matthews Oct 10 '15 at 16:57
  • @owacoder @Daniel Ohh, it's a long story. I'm trying to implement GLSL-style vectors. For them, I need a behavior like this: `vec3 a(1,2,3); vec4 b = a.zxyy; // 3,1,2,2` To implement that behavior (`.zxyy`) i need the vector class to be a union. One of it's fields is a structure with x, y, z and w members. Other fields are letter combinations like `zxyy`. For each such field I need a separate (empty) type and a separate set of (macro-generated) overloaded operators. – HolyBlackCat Oct 10 '15 at 16:58
  • These overloaded operators shall somehow access x, y, z and w fields. The only way I see is to cast an address of such empty class to pointer to an entire vector and then use `->x`, `->y`, `->z`, `->w` on it. – HolyBlackCat Oct 10 '15 at 16:58
  • @vishal Yes, I think it is, but I can't be sure without a reference from the standard... – HolyBlackCat Oct 10 '15 at 16:59
  • @ThomasMatthews I mean, it's more C-style than C++-style. – HolyBlackCat Oct 10 '15 at 17:00
  • Let's amend, *first* field, to access other *first* field. If there's type mismatch and e.g. this is a union of structs, third field of the first struct may be entirely elsewhere than the third field of the second struct. – SF. Oct 10 '15 at 17:26

2 Answers2

7

All members of a union must reside at the same address, that is guaranteed by the standard. What you are doing is indeed well-defined behavior, but it shall be noted that you cannot read from an inactive member of a union using the same approach.

Note: Do not use c-style casts, prefer reinterpret_cast in this case.


As long as all you do is write to the other data-member of the union, the behavior is well-defined; but as stated this changes which is considered to be the active member of the union; meaning that you can later only read from that you just wrote to.

union U {
    int a;
    float b;
};

int main () {
    U u;
    int *p = &u.a;
    reinterpret_cast<float*> (p) = 1.0f; // ok, well-defined
}

Note: There is an exception to the above rule when it comes to layout-compatible types.


The question can be rephrased into the following snippet which is semantically equivalent to a boiled down version of the "problem".

#include <type_traits>
#include <algorithm>
#include <cassert>

int main () {
  using union_storage_t = std::aligned_storage<
    std::max ( sizeof(int),   sizeof(float)),
    std::max (alignof(int),  alignof(float))
  >::type;

  union_storage_t u;

  int   * p1 = reinterpret_cast<  int*> (&u);
  float * p2 = reinterpret_cast<float*> (p1);
  float * p3 = reinterpret_cast<float*> (&u);

  assert (p2 == p3); // will never fire
}

What does the Standard (n3797) say?

9.5/1    Unions    [class.union]

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static dat amembers ca nbe stored in a union at any time. [...] The size of a union is sufficient to contain the largest of its non-static data members. Each non-static data member is allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.

Note: The wording in C++11 (n3337) was underspecified, even though the intent has always been that of C++14.

Community
  • 1
  • 1
Filip Roséen - refp
  • 62,493
  • 20
  • 150
  • 196
  • I guess it's underspecified, see [CWG 1116](http://wg21.cmeerw.net/cwg/issue1116) – dyp Oct 10 '15 at 17:09
  • @KerrekSB I will change to make the post reference C++14, since the wording is more clear (but the intent has always been that wording - even in C++11). – Filip Roséen - refp Oct 10 '15 at 17:10
  • Can you elaborate further on how writing to the `reinterpret_cast`'ed pointer doesn't violate the strict alias rules, regardless of what you can do with the union afterwards? – Mark B Oct 10 '15 at 17:47
  • @MarkB since the data-members are to be placed at the same address, the `reinterpret_cast` is fine (since we can interpret this address as the start of an object of type `T2` - even though it currently holds an object of type `T1`). – Filip Roséen - refp Oct 10 '15 at 17:50
  • @MarkB see the added example. – Filip Roséen - refp Oct 10 '15 at 17:58
  • "Note: There is an exception to the above rule when it comes to layout-compatible types." Where? – underscore_d Jan 05 '16 at 10:46
  • If you meant the "standard-layout `struct`s with common initial sequence" proviso, some people claim it only guarantees behaviour when the `union` is passed to the user (making its declaration visible) and punning is done by reading the `struct` initial members via `union` member accessors ('punning' `u.structA.a` and `u.structB.b`, not just `structA` and `structB`). I'm not convinced, but the wording is badly ambiguous. If there is a section that sets defined behaviour for reads/writes from/to different layout-compatible `union` members, treated as individual objects, please let me know. – underscore_d Jan 05 '16 at 10:54
3

Yes, it is legal. Using explicit casts, you can do almost anything.

As other comments have stated, all members in a union start at the same address / location so casting a pointer to a different member is pointless.

The assembly language will be the same. You want to make the code easy to read so I don't recommend the practice. It is confusing and there is no benefit.

Also, I recommend a "type" field so that you know when the data is in float format versus int format.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154