2

If I have a union with two data members of the same type, differing only by CV-qualification:

template<typename T>
union A
{
    private:
        T x_priv;
    public:
        const T x_publ;

    public:
        // Accept-all constructor
        template<typename... Args>
        A(Args&&... args) : x_priv(args...) {}

        // Destructor
        ~A() { x_priv.~T(); }
};

And I have a function f that declares a union A, thus making x_priv the active member and then reads x_publ from that union:

int f()
{
    A<int> a {7};

    return a.x_publ;
}

In every compiler I tested there were no errors compiling nor at runtime for both int types and other, more complex, types such as std::string and std::thread.

I went to see on the standard if this was legal behavior and I started on looking at the difference of T and const T:

6.7.3.1 [basic.type.qualifier]

The cv-qualified or cv-unqualified versions of a type are distinct types; however, they shall have the same representation and alignment requirements ([basic.align]).

This means that when declaring a const T it has the exact same representation in memory as a T. But then I found that the standard actually disallows this for some types, which I found weird, as I see no reason for it.

I started my search on accessing non-active members.

It is only legal to access the common initial sequence of T and const T if both are standard-layout types.

10.4.1[class.union]

At most one of the non-static data members of an object of union type can be active at any time [...] [ Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence ([class.mem]), and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members; see [class.mem]. — end note ]

The initial sequence is basically the order of the non-static data members with a few exceptions, but since T and const T have the exact same members in the same layout, this means that the common initial sequence of T and const T is all of the members of T.

10.3.22 [class.mem]

The common initial sequence of two standard-layout struct ([class.prop]) types is the longest sequence of non-static data members and bit-fields in declaration order, starting with the first such entity in each of the structs, such that corresponding entities have layout-compatible types, either both entities are declared with the no_­unique_­address attribute ([dcl.attr.nouniqueaddr]) or neither is, and either both entities are bit-fields with the same width or neither is a bit-field. [ Example:

And here is where the restrictions come in, it restricts some types from being accessed, even though they have the exact same representation in memory:

10.1.3 [class.prop]

A class S is a standard-layout class if it:

  • (3.1) has no non-static data members of type non-standard-layout class (or array of such types) or reference,
  • (3.2) has no virtual functions and no virtual base classes,
  • (3.3) has the same access control for all non-static data members,
  • (3.4) has no non-standard-layout base classes,
  • (3.5) has at most one base class subobject of any given type,
  • (3.6) has all non-static data members and bit-fields in the class and its base classes first declared in the same class, and
  • (3.7) has no element of the set M(S) of types as a base class, where for any type X, M(X) is defined as follows.108 [ Note: M(X) is the set of the types of all non-base-class subobjects that may be at a zero offset in X. — end note ]
    • (3.7.1) If X is a non-union class type with no (possibly inherited) non-static data members, the set M(X) is empty.
    • (3.7.2) If X is a non-union class type with a non-static data member of type X_0 that is either of zero size or is the first non-static data member of X (where said member may be an anonymous union), the set M(X) consists of X_0 and the elements of M(X_0).
    • (3.7.3) If X is a union type, the set M(X) is the union of all M(U_i) and the set containing all U_i, where each U_i is the type of the ith non-static data member of X.
    • (3.7.4) If X is an array type with element type X_e , the set M(X) consists of X e and the elements of M (X_e).
    • (3.7.5) If X is a non-class, non-array type, the set M(X) is empty.

My questions is is there any reason for this to not be valid behavior?.

Essentially is it that:

  • The standard makers forgot to account for this particular case?

  • I haven't read some part of the standard that allows this behavior?

  • There's some more specific reason for this not to be valid behavior?

A reason for this to be valid syntax is, for example, having a 'readonly' variable in a class, as such:

struct B;

struct A
{
     ... // Everything that struct A had before

     friend B;
}

struct B
{
    A member;

    void f() { member.x_priv = 100; }
}

int main()
{
    B b;
    b.f();                   // Modifies the value of member.x_priv
    //b.member.x_priv = 100; // Invalid, x_priv is private
    int x = b.member.x_publ; // Fine, x_publ is public
}

This way you don't need a getter function, which can cause performance overhead and although most compiler would optimize that away it still increases your class, and to get the variable you'd have to write int x = b.get_x().

Nor would you need a const reference to that variable (as described in this question), which while it works great, it adds size to your class, which can be bad for sufficiently big classes or classes that need to be as small as possible.

And it is weird having to write b.member.x_priv instead of b.x_priv but this would be fixable if we could have private members in anonymous unions then we could rewrite it like this:

struct B
{
    union
    {
        private:
            int x_priv;
        public:
            int x_publ;

        friend B;
    };

    void f() { x_priv = 100; }
}

int main()
{
    B b;
    b.f();            // Modifies the value of member.x_priv
    //b.x_priv = 100; // Invalid, x_priv is private
    int x = b.x_publ; // Fine, x_publ is public
}

Another use case might be to give various names to the same data member, lie for example in a Shape, the user might want to refer to the position as either shape.pos, shape.position, shape.cur_pos or shape.shape_pos.

Although this would probably create more problems than it is worth, such a use case might be favorable when for example a name should be deprecated .

Filipe Rodrigues
  • 1,843
  • 2
  • 12
  • 21
  • 4
    "*My questions is is there any reason for this to not be valid behavior?*" A better question is whether there is a reason for it to *be* valid behavior. There's basically no point to it; you can just have a public member function that returns a `const&` to the private member. Just because implementations may let you get away with it is not a good reason for something to be well-defined. – Nicol Bolas Sep 28 '18 at 19:33
  • Also, I'm not sure `language-lawyer` applies as a tag when you're asking about the *reasoning* for a feature. The standard is what it is; it doesn't explain *why* it is a certain way. – Nicol Bolas Sep 28 '18 at 19:34
  • @NicolBolas I wasn't sure about the tag, but I chose to include it because of the first point about if the standard overlapped this, should I remove it? Also I edited the question on a reason for it to be valid behavior – Filipe Rodrigues Sep 28 '18 at 20:30
  • The workaround is not to add a const ref member to your class (which, like you said, typically increases the size of the class). The workaround is to add a member function that returns a const ref. – Brian Bi Dec 28 '22 at 21:27

1 Answers1

0

Code like this:

struct A { int i; };
struct B { int j; };
union U {
    struct A a;
    struct B b;
};
int main() {
    union U u;
    u.a.i = 1;
    printf("%d\n", u.b.j);
}

is valid in C. For the sake of backward compatibility, it was considered desirable to ensure that it is also valid in C++. The special rules about common initial sequences of standard-layout structs ensure this backward compatibility. Extending the rule to allow more cases to be well-defined—ones involving non-standard-layout structs—is not necessary for C compatibility, since all structs that can be defined in the common subset of C and C++ are automatically standard-layout structs in C++.

Actually, the C++ rules are a little bit more permissive than required for C compatibility. They allow some cases involving base classes too:

struct A { int i; };
struct B { int j; };
struct C : A { };
struct D : B { };
// C and D have a common initial sequence consisting of C::i and D::j

But in general, structs in C++ can be much more complicated than their C counterparts. They can, for example, have virtual functions and virtual base classes, and those can affect their layout in an implementation-defined manner. For this reason, it's not so easy to make more cases of type punning through unions well-defined in C++. You would really have to sit down with implementers and discuss what the conditions would be such that the committee should mandate that two classes have the same layout for their common initial sequence and not leave it up to the implementation. Currently, that mandate applies only to standard-layout classes.

There are various rules in the standard that are strong enough to imply that T and const T always have the exact same layout even if T is not a standard-layout class. For this reason, it would be possible to make certain forms of type punning between a T member and a const T member of a union well-defined even if T is not standard-layout. However, adding only this very special case to the language is of dubious value and I think it's unlikely that the committee would accept such a proposal unless you have a really compelling use case. Not wanting to provide a getter that returns a const reference, simply because you don't want to write the () to call the getter each time you need access, is unlikely to convince the committee.

Brian Bi
  • 111,498
  • 10
  • 176
  • 312