union 'punning' structs w/ "common initial sequence": Why does C (99+), but not C++, stipulate a 'visible declaration of the union type'?

Question

Background

Discussions on the mostly un-or-implementation-defined nature of type-punning via a union typically quote the following bits, here via @ecatmur ( https://stackoverflow.com/a/31557852/2757035 ), on an exemption for standard-layout structs having a "common initial sequence" of member types:

C11 (6.5.2.3 Structure and union members; Semantics):

[...] if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

C++03 ([class.mem]/16):

If a POD-union contains two or more POD-structs that share a common initial sequence, and if the POD-union object currently contains one of these POD-structs, it is permitted to inspect the common initial part of any of them. Two POD-structs share a common initial sequence if corresponding members have layout-compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

Other versions of the two standards have similar language; since C++11 the terminology used is standard-layout rather than POD.

Since no reinterpretation is required, this isn't really type-punning, just name substitution applied to union member accesses. A proposal for C++17 (the infamous P0137R1) makes this explicit using language like 'the access is as if the other struct member was nominated'.

But please note the bold - "anywhere that a declaration of the completed type of the union is visible" - a clause that exists in C11 but nowhere in C++ drafts for 2003, 2011, or 2014 (all nearly identical, but later versions replace "POD" with the new term standard layout). In any case, the 'visible declaration of union type bit is totally absent in the corresponding section of any C++ standard.

@loop and @Mints97, here - https://stackoverflow.com/a/28528989/2757035 - show that this line was also absent in C89, first appearing in C99 and remaining in C since then (though, again, never filtering through to C++).

Standards discussions around this

[snipped - see my answer]

Questions

From this, then, my questions were:

What does this mean? What is classed as a 'visible declaration'? Was this clause intended to narrow down - or expand up - the range of contexts in which such 'punning' has defined behaviour?
Are we to assume that this omission in C++ is very deliberate?
What is the reason for C++ differing from C? Did C++ just 'inherit' this from C89 and then either decide - or worse, forget - to update alongside C99?
If the difference is intentional, then what benefits or drawbacks are there to the 2 different treatments in C vs C++?
What, if any, interesting ramifications does it have at compile- or runtime? For example, @ecatmur, in a comment replying to my pointing this out on his original answer (link as above), speculated as follows.

I'd imagine it permits more aggressive optimization; C can assume that function arguments S* s and T* t do not alias even if they share a common initial sequence as long as no union { S; T; } is in view, while C++ can make that assumption only at link time. Might be worth asking a separate question about that difference.

Well, here I am, asking! I'm very interested in any thoughts about this, especially: other relevant parts of the (either) Standard, quotes from committee members or other esteemed commentators, insights from developers who might have noticed a practical difference due to this - assuming any compiler even bothers to enforce C's added clause - and etc. The aim is to generate a useful catalogue of relevant facts about this C clause and its (intentional or not) omission from C++. So, let's go!

FWIW, at -O3, gcc, g++, clang and clang++ all assume that `S*` and `T*` arguments do not alias even when a union is in view. This means that a program that passes the aliasing `S*` and `T*` union subobjects will behave differently depending on optimization level. Example: http://coliru.stacked-crooked.com/a/b57c8dd9e2ef3a02 — ecatmur, Jan 05 '16 at 17:30
_what matters most from SO's perspective is generating a useful discussion_ - bear in mind discussion is explicitly off-topic here, so you may wish to edit that out. — halfer, Jan 05 '16 at 23:28
Well, I meant discussion as in _educated commentary on the questions raised_, but I can work on better wording later. — underscore_d, Jan 05 '16 at 23:50
@ecatmur Very interesting! `T` is, of course, updated to 42 'in the background' - so the _write_ isn't binned - but the optimiser doesn't reflect that in the return value, as it assumes, given no aliasing, the result must be 5. http://coliru.stacked-crooked.com/a/04921db9e5f3945a I'd need to test whether this affects me as (A) I'm generally not referring to such unions via pointers and (B) even less am I doing this via functions. There are probably numerous other ways this can bite me if this turns out to be a general behaviour relevant to such unions, though. Will post more findings tomorrow. — underscore_d, Jan 05 '16 at 23:57
... and the functions I do have using pointers to union members only take one at a time. What 'scope' is usually applied when deciding whether to ignore/reorder operations that might alias? Assuming it's somewhat broader than 'any function with 2+ pointer arguments', is there a general rule, or is it _so_ UB that anything can happen? — underscore_d, Jan 06 '16 at 00:46
@ecatmur It is perhaps notable that `gcc` and `g++` alias when the member types are changed to `char` (showing 42 throughout, unlike before), but `clang` acts the same as when using `int`s. Which, if any, is more correct? Fwiw, 99.9% of cases in which I'd be wanting to use this pattern, the `struct`s would contain `unsigned char` only. I know there's an exception for `char` in aliasing but not how/if that's related to this observation. — underscore_d, Jan 06 '16 at 00:53
Aliasing analysis is generally performed at a function level, but functions can be inlined, and whole-program optimization is only getting better. The aliasing exception for character types doesn't apply when the char object is known to be a subobject, so gcc is being over-cautious. — ecatmur, Jan 06 '16 at 10:16
Thanks. Since you've shown aliasing is unaffected by `union` visibility in major compilers for C & C++, do you think that indicates it's not directly related to the added quote being discussed? Either way, am I 'safe' if (A) not using pointers to such members, (B) only passing 1 to any function, or (C) anywhere I need to alias, `reinterpret_cast`ing to/from `char` within scope? Also, if you know a good summary of all these nuances, preferably more condensed than the standard - few things I've read have pointed out crucial caveats like you have here. Sorry to keep bombarding you with questions! — underscore_d, Jan 06 '16 at 10:32

score 23 · Accepted Answer · edited May 23 '17 at 12:08

I've found my way through the labyrinth to some great sources on this, and I think I've got a pretty comprehensive summary of it. I'm posting this as an answer because it seems to explain both the (IMO very misguided) intention of the C clause and the fact that C++ does not inherit it. This will evolve over time if I discover further supporting material or the situation changes.

This is my first time trying to sum up a very complex situation, which seems ill-defined even to many language architects, so I'll welcome clarifications/suggestions on how to improve this answer - or simply a better answer if anyone has one.

Finally, some concrete commentary

Through vaguely related threads, I found the following answer by @tab - and much appreciated the contained links to (illuminating, if not conclusive) GCC and Working Group defect reports: answer by tab on StackOverflow

The GCC link contains some interesting discussion and reveals a sizeable amount of confusion and conflicting interpretations on part of the Committee and compiler vendors - surrounding the subject of union member structs, punning, and aliasing in both C and C++.

At the end of that, we're linked to the main event - another BugZilla thread, Bug 65892, containing an extremely useful discussion. In particular, we find our way to the first of two pivotal documents:

Origin of the added line in C99

C proposal N685 is the origin of the added clause regarding visibility of a union type declaration. Through what some claim (see GCC thread #2) is a total misinterpretation of the "common initial sequence" allowance, N685 was indeed intended to allow relaxation of aliasing rules for "common initial sequence" structs within a TU aware of some union containing instances of said struct types, as we can see from this quote:

The proposed solution is to require that a union declaration be visible if aliases through a common initial sequence (like the above) are possible. Therefore the following TU provides this kind of aliasing if desired:

union utag {
    struct tag1 { int m1; double d2; } st1;
    struct tag2 { int m1; char c2; } st2;
};

int similar_func(struct tag1 *pst2, struct tag2 *pst3) {
     pst2->m1 = 2;
     pst3->m1 = 0;   /* might be an alias for pst2->m1 */
     return pst2->m1;
}

Judging by the GCC discussion and comments below such as @ecatmur's, this proposal - which seems to mandate speculatively allowing aliasing for any struct type that has some instance within some union visible to this TU - seems to have received great derision and rarely been implemented.

It's obvious how difficult it would be to satisfy this interpretation of the added clause without totally crippling many optimisations - for little benefit, as few coders would want this guarantee, and those who do can just turn on fno-strict-aliasing (which IMO indicates larger problems). If implemented, this allowance is more likely to catch people out and spuriously interact with other declarations of unions, than to be useful.

Omission of the line from C++

Following on from this and a comment I made elsewhere, @Potatoswatter in this answer here on SO states that:

The visibility part was purposely omitted from C++ because it's widely considered to be ludicrous and unimplementable.

In other words, it looks like C++ deliberately avoided adopting this added clause, likely due to its widely pereceived absurdity. On asking for an "on the record" citation of this, Potatoswatter provided the following key info about the thread's participants:

The folks in that discussion are essentially "on the record" there. Andrew Pinski is a hardcore GCC backend guy. Martin Sebor is an active C committee member. Jonathan Wakely is an active C++ committee member and language/library implementer. That page is more authoritative, clear, and complete than anything I could write.

Potatoswatter, in the same SO thread linked above, concludes that C++ deliberately excluded this line, leaving no special treatment (or, at best, implementation-defined treatment) for pointers into the common initial sequence. Whether their treatment will in future be specifically defined, versus any other pointers, remains to be seen; compare to my final section below about C. At present, though, it is not (and again, IMO, this is good).

What does this mean for C++ and practical C implementations?

So, with the nefarious line from N685... 'cast aside'... we're back to assuming pointers into the common initial sequence are not special in terms of aliasing. Still. it's worth confirming what this paragraph in C++ means without it. Well, the 2nd GCC thread above links to another gem:

C++ defect 1719. This proposal has reached DRWP status: "A DR issue whose resolution is reflected in the current Working Paper. The Working Paper is a draft for a future version of the Standard" - cite. This is either post C++14 or at least after the final draft I have here (N3797) - and puts forward a significant, and in my opinion illuminating, rewrite of this paragraph's wording, as follows. I'm bolding what I consider to be the important changes, and {these comments} are mine:

In a standard-layout union with an active member {"active" indicates a union instance, not just type} (9.5 [class.union]) of struct type T1, it is permitted to read {formerly "inspect"} a non-static data member m of another union member of struct type T2 provided m is part of the common initial sequence of T1 and T2. [Note: Reading a volatile object through a non-volatile glvalue has undefined behavior (7.1.6.1 [dcl.type.cv]). —end note]

This seems to clarify the meaning of the old wording: to me, it says that any specifically allowed 'punning' among union member structs with common initial sequences must be done via an instance of the parent union - rather than being based on the type of the structs (e.g. pointers to them passed to some function). This wording seems to rule out any other interpretation, a la N685. C would do well to adopt this, I'd say. Hey, speaking of which, see below!

The upshot is that - as nicely demonstrated by @ecatmur and in the GCC tickets - this leaves such union member structs by definition in C++, and practically in C, subject to the same strict aliasing rules as any other 2 officially unrelated pointers. The explicit guarantee of being able to read the common initial sequence of inactive union member structs is now more clearly defined, not including vague and unimaginably tedious-to-enforce "visibility" as attempted by N685 for C. By this definition, the main compilers have been behaving as intended for C++. As for C?

Possible reversal of this line in C / clarification in C++

It's also very worth noting that C committee member Martin Sebor is looking to get this fixed in that fine language, too:

Martin Sebor 2015-04-27 14:57:16 UTC If one of you can explain the problem with it I'm willing to write up a paper and submit it to WG14 and request to have the standard changed.

Martin Sebor 2015-05-13 16:02:41 UTC I had a chance to discuss this issue with Clark Nelson last week. Clark has worked on improving the aliasing parts of the C specification in the past, for example in N1520 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1520.htm). He agreed that like the issues pointed out in N1520, this is also an outstanding problem that would be worth for WG14 to revisit and fix."

Potatoswatter inspiringly concludes:

The C and C++ committees (via Martin and Clark) will try to find a consensus and hammer out wording so the standard can finally say what it means.

We can only hope!

Again, all further thoughts are welcome.

Is there any *other* Standard means by which code which relies upon the Common Initial Sequence rule can work if structure types don't all have the same alignment, and might not all be defined in the same place (e.g.. `foo.h` defines a "header" struct whose total member length is not a multiple of alignment; `bar.h` and `boz.h` each define a union containing that structure and their own extended version; is there any way to write a function in foo.c which can work accept pointers to the entended structure types from bar.h and boz.h and access the common members thereof?) — supercat, Jun 25 '16 at 20:37
If gcc didn't want to have to make pessimistic presumptions about aliasing of types that appear in unions, would there have been any problem whatsoever with saying that a union of structure types is an indication that common initial members of those types might alias *absent an `__attribute` or `pragma` indicating otherwise? — supercat, Jun 25 '16 at 20:39
I notice your last edit still suggests that C clause is misguided. I'm still awaiting any practical way of adapting code that relies upon the ability to use a CIS of two types with different sizes and alignment requirements (behavior that used to be universally supported and non-controversial), so as to be defined under gcc's interpretation of the rules. Some compiler writers may not have liked what the rule required, but most of the "optimizations" would curtail would be the phony sort that make code useless. — supercat, Sep 07 '16 at 21:34
Accessing the struct using an expression that gets it as a member of the union, rather that e.g. just as a pointer-to-struct with no context. — underscore_d, May 31 '18 at 12:53

score 6 · Answer 2 · answered Jan 05 '16 at 16:47

6

I suspect it means that the access to these common parts is permitted not only through the union type, but outside of the union. That is, suppose we have this:

union u {
  struct s1 m1;
  struct s2 m2;
};

Now suppose that in some function we have a struct s1 *p1 pointer which we know was lifted from the m1 member of such a union. We can cast this to a struct s2 * pointer and still access the members which are in common with struct s1. But somewhere in the scope, a declaration of union u has to be visible. And it has to be the complete declaration, which informs the compiler that the members are struct s1 and struct s2.

The likely intent is that if there is such a type in scope, then the compiler has knowledge that struct s1 and struct s2 are aliased, and so an access through a struct s1 * pointer is suspected of really accessing a struct s2 or vice versa.

In the absence of any visible union type which joins those types this way, there is no such knowledge; strict aliasing can be applied.

Since the wording is absent from C++, then to take advantage of the "common initial members relaxation" rule in that language, you have to route the accesses through the union type, as is commonly done anyway:

union u *ptr_any;
// ...
ptr_any->m1.common_initial_member = 42;
fun(ptr_any->m2.common_initial_member);  // pass 42 to fun

answered Jan 05 '16 at 16:47

Kaz

55,781
9
100
149

1

See, this is interesting... but also kinda depressing if correct. I interpreted the line to mean we _couldn't_ safely access members other than through an instance of the `union` in C, and C++ had no such restriction. However, if correct, your interpretation says exactly the opposite! So, if you're right, I'd have to rely on `g++`'s implementation after all...gah. We'll see what others think; I'm not yet a qualified `language-lawyer`, so I assume a consensus from those who are is likely to reflect the true meaning. If so, I hope C++ simply forgot to include this! Thanks a lot for the thoughts! – underscore_d Jan 05 '16 at 16:53
2

@underscore_d That's a lot to hope for: that C++ is *more* permissive than C in some area. – Kaz Jan 05 '16 at 21:42
1

Well, I thought @ecatur had the same suspicion as me, but their tests don't seem all that encouraging from, if not directly `union` then from an aliasing perspective. Will run some tests of my own and see whether ambiguity arises in any non-pointer situations - certainly all my similar `union`s currently work fine, but I've not yet done anything with pointers to them + optimisation. At least I have `fno-strict-aliasing` to fall back on... D'oh. – underscore_d Jan 06 '16 at 00:21
1

Congratulations on a correct suspicion :-) See my answer (when you have enough time to read it all...) – underscore_d Jan 06 '16 at 19:31
3

@underscore_d The suspicion had to be right because in what circumstance would we be accessing the member of a union, such that the (complete) declaration of the union is *not* in scope? The only way that is possible is that we obtain a pointer to that member and pass it out of scope. And that situation is not being ruled out: it is just being made subject to the `union` type still being in scope. Which must mean that the intent is that the `union` type informs the translator that the structs are involved in a union (so watch out: pointers to those struct types may be to union members). – Kaz Jan 06 '16 at 21:07
1

When you put it like that, it seems so obvious! And of course, textually, it supports your second indication too, that C++ is rarely more permissive. I think, on balance, it's good C++ did not import this allowance - and that C compilers seem to have largely or totally ignored it as highly impractical. – underscore_d Jan 06 '16 at 21:23
3

@underscore_d The thing is, that it's not actually an *allowance* but a *restriction*! From the perspective of the programmer, it's an allowance. But the standard is a set of requirements *for implementors* (many of which are given in terms of program behavior). The C standard is *tighter*: it *requires* implementors to be careful in their optimizations and support that usage. If that code is ported to C++, it has undefined behavior. Valid C that is undefined C++, though not diagnosed in C++, doesn't bode well for C++. Nobody wants *new* UB concerns when converting C to C++! – Kaz Jan 06 '16 at 21:41
1

Again, I like how you state things from other perspectives, so thanks! In practical terms, I wonder how many people actually relied on this clause, as certainly the compiler implementors seem to have dismissed it. I agree it would be unfortunate if something silently broke because of this, though to play devil's advocate, it might ultimately serve as a good caution that (A) questionably useful choices can be made - and possibly reversed - by committees, and (B) C/C++ do not have a subset/superset relationship. I just hope if anything does break, it's not in a life-or-death application... – underscore_d Jan 06 '16 at 21:48
1

@underscore_d Compiler implementors can dismiss it if they don't plan to actually somehow take advantage of the assumption that two different struct types do not overlap, even if they have the type/name members in the same places. For instance, the program does `a->count = 0`, but the register-cached value of `b->count` remains unaffected, even though `a` and `b` point to the same memory and `count` is of the same type at the same offset. If that deep kind of aliasing optimization doesn't happen, then the requirement is *implicitly* being met and the implementors can ignore it. – Kaz Jan 06 '16 at 21:52
2

Sure, but here I meant "ignore" in the sense that GCC and Clang _do_ have well-developed optimisers that take advantage of strict aliasing, and neither make an exception for aliasing in a 'visibly declared `union`'. So it's a kind of active ignorance due to complications/disagreement, rather than an implicit one by satisfying the clause as a side-effect of other things – underscore_d Jan 06 '16 at 21:57
4

Even in C89 mode, gcc no longer recognizes the common initial sequence rule for union members accessed through pointers of their individual types, even when the object in question is a union and the code using the union is in the same translation unit as the code acting upon the members; I see no way in which such compiler behavior can be deemed legitimate, since nothing in C89 would forbid such usage. – supercat Apr 19 '16 at 22:37
2

@underscore_d: [see above comment] If the existence of a union of structures s1 and s2 doesn't allow use of an s1* to access common members of an s2, how should code portably access common members of a pointer which is known to identify either an s1 or an s2 (but it's unknown which) if the types may have differing alignment requirements [e.g. s1 contains uint16_t but s2 contains some uint16_t and some uint64_t], and if not all instances of s1 satisfy the alignment of s2? By my understanding, casting an unaligned s1* to a union type containing s2 would be UB, would it not? – supercat Apr 19 '16 at 22:44
1

@supercat Did GCC ever adhere to this? I can't help but feel the clause on pointers was a piece of conflation with the rule for accessing the CIS via a union instance - which some other folk picked up, ran in a totally different direction, and managed to get a very different - or at least ambiguous - conclusion into the Standard... Having backed compiler writers into a place where both (A) determining the Standard's intent, and (B) following one literal reading of it, are extremely impractical, it seems the only practical solution is to nix the offending clause (or adjust it to agree with GCC) – underscore_d Apr 20 '16 at 10:35
3

@underscore_d: The Standard's intent is clear, whether compiler writers like it or not. In C89, code which received from another translation unit pointers to two structure types that shared a CIS and might be part of a union in that other translation unit had to regard accesses to the CIS of one as a potential access to the CIS of either. From a practical standpoint, mainstream compilers prior to C99 unanimously (do you know of any exceptions?) regarded the CIS rule as extending to type-punned pointer rather than just unions, since that made the rule more useful and... – supercat Apr 20 '16 at 14:31
3

...there was no benefit to interpreting it more narrowly. Nonetheless, the authors of C99 ignored that and viewed the rule as only applying to accesses through unions, but recognized that there had to be a way to tell the compiler that the CIS of two structure types could alias. Rather than invent a new syntax, they use the definition of a union containing both structures as a means of providing such notice. A program which passed pointers to members of a union defined in one translation unit to another unit where they would be used in such a fashion as to cause aliasing in the CIS... – supercat Apr 20 '16 at 14:36
1

@supercat Discussion on GCC's bugzilla implies it never supported this. Or have I read it wrongly? Regardless, I have no influence on any of this, and nor am I well-versed enough on either C or its implementations (as is probably evident) to be able to provide you with any data on how this might have been handled then and now. I can only restate my humble opinion that GCC's deliberately chosen interpretation/behaviour (only allow aliasing via a `union` instance itself) is practical, if possibly unfortunate from a compliance perspective - which doesn't get us anywhere! – underscore_d Apr 20 '16 at 14:52
2

...would not be C99 compliant, but could be made compliant by copying the union definition into the second translation unit. Since there is no other mechanism via which such C89 code could be made compliant without a massive rewrite, I find it hard to believe that the rule about visible union definitions was not intended to allow such code to be brought into compliance. While I'll agree that a good language needs to have ways of letting a compiler know what things won't alias, and C would be lacking in that regard if it had to be generous with regard to CIS rules... – supercat Apr 20 '16 at 14:55
1

@supercat Nice point about this potentially being an attempt to enable simple 'legalisation' of C89 code to C99, though. I still can't vouch for what GCC and co may or may not have done - i.e. what their status was vs this aspect of the standard - at any given points in their history, having only started using them in 2008 or so, and only with real gusto in the last couple of years. – underscore_d Apr 20 '16 at 14:57
2

@underscore_d: ...it is not such generosity which would make C inadequate. Good aliasing analysis requires means via which a program can say "I am going to use this range of memory exclusively as this type until further notice" and "I am done using this range of memory as this type"; for efficiency, both notifications should have means of indicating whether the present content of the memory is of any interest. – supercat Apr 20 '16 at 14:57
1

@supercat All I can really conclude with is, given how many complex and nigh-philosophical questions this raises, I can see why strict (non-)aliasing is a thing! I will say that the part of this that C++ (and GNU C) _does_ retain - i.e. the allowance for sharing CIS members between different `struct`s _accessed via_ a `union` - is absolutely essential in my current project, so I'm very glad for that. Thankfully I've not (yet?) needed to worry about pointer aliasing, and as mentioned, the complexity of it all makes me hope I don't have to ;-) – underscore_d Apr 20 '16 at 15:00
3

@underscore_d: Either this rule was intended to allow code that was strictly conforming under C89 to be easily made strictly conforming under C99, or the authors of the Standard were willing to forbid strictly-conforming C89 constructs without offering any replacement. I don't know the history of gcc, but I think it was somewhere around 2005-2009 that it started trying to get aggressive with aliasing of CIS members. I can believe that GCC never cared about whether a union of two structure types existed anywhere in deciding whether CIS members could alias, but that's because... – supercat Apr 20 '16 at 15:06
3

...it had no obligation to care back when it would always behave as though a union definition might exist. Behaving instead as though no union definition exists--even when one does--was a major change, no matter how loudly gcc's maintainers may claim it wasn't. – supercat Apr 20 '16 at 15:12
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/109702/discussion-between-underscore-d-and-supercat). – underscore_d Apr 20 '16 at 15:17
1

...where I can only assume that if GCC previously "would always behave as though a union definition might exist", that was merely because it was less aggressive across-the-board when performing aliasing analyses - rather than being specific to this clause of the Standard - seeing as the GCC devs' stated objection to the pro-aliasing _interpretation_ of this clause appears to be _exactly_ that adhering to it would require their compiler to "behave as though a union definition might exist" or at least hunt the code space for one, neither of which they wanted to do... but yeah, off to chat we go. – underscore_d Apr 20 '16 at 15:24
1

@underscore_d: Under the C99 rules, a compiler wouldn't have to hunt for one but merely observe whether one is defined *at the point where the structure is used*, which could be easily accommodated by keeping for each structure type a list of other structure types that cohabit with it in at least one union, and which have at least one member in the CIS. Honoring the CIS in such cases might impede some optimizations, but I think the cost of even honoring it globally would be far less than the gloom and doom gcc maintainers claim. – supercat Apr 20 '16 at 17:32