It took me a while to realize what the crux of the issue is here. DR236 discusses it. The issue is actually about passing pointers to a function which point to overlapping storage; and whether the compiler is allowed to assume that such pointers may alias each other or not.
If we are just discussing aliasing of union members then it would be simpler. In the following code:
u.i = 5;
u.l = 6;
printf("%d\n", u.i);
the behaviour is undefined because the effective type of u
is long
; i.e. the storage of u
contains a value that was stored as a long
. But accessing these bytes via an lvalue of type int
violates the aliasing rules of 6.5p7. The text about inactive union members having unspecified values does not apply (IMO); the aliasing rules trump that, and that text comes into play when aliasing rules are not violated, for example, when accessed via an lvalue of character type.
If we exchange the order of the first two lines above then the program would be well-defined.
However, things all seem to change when the accesses are "hidden" behind pointers to a function.
The DR236 addresses this via two examples. Both examples have check()
as in this post. Example 1 malloc
s some memory and passes h
and k
both pointing to the start of that block. Example 2 has a union similar to this post.
Their conclusion is that Example 1 is "unresolved", and Example 2 is UB. However, this excellent blog post points out that the logic used by DR236 in reaching these conclusions is inconsistent. (Thanks to Tor Klingberg for finding this).
The last line of DR236 also says:
Both programs invoke undefined behavior, by calling function f with pointers qi
and qd
that have different types but designate the same region of storage. The translator has every right to rearrange accesses to *qi
and *qd
by the usual aliasing rules.
(apparently in contradiction of the earlier claim that Example 1 was unresolved).
This quote suggests that the compiler is allowed to assume that two pointers passed to a function are restrict
if they have different types, however I cannot find any wording in the Standard to this effect, or even addressing the issue of the compiler re-ordering accesses through pointers.
It has been suggested that the aliasing rules allow the compiler to conclude that an int *
and a long *
cannot access the same memory. However, Examples 1 and 2 flatly contradict this.
If the pointers had the same type, then I think we agree that the compiler cannot reorder the accesses, because they might both point to the same object. The compiler has to assume the pointers are not restrict
unless specifically declared as such.
Yet, I fail to see the difference between this case, and the cases of Example 1 and 2.
DR236 also says:
Common understanding is that the union declaration must be visible in the translation unit.
which again contradicts the claim that Example 2 is UB, because in Example 2 all of the code is in the same translation unit.
My conclusion: it seems to me that the C99 wording indicates that the compiler should not be allowed to re-order *h = 5;
and *k = 6;
in case they alias overlapping storage. Notwithstanding the fact that the DR236 contradicts the C99 wording and does not clarify matters. But reading *h
after that should cause undefined behaviour, so the compiler is allowed to generate output of 5
or 6
, or anything else.
In my reading, if you modify check()
to be *k = 6; *h=5;
then it should be well-defined to print 5
. It'd be interesting to see whether a compiler still does something else in this case, and also the compiler's rationale if it does.