As Kamil explains, it's UB. Even int
and long
(or long
and long long
) aren't alias-compatible even when they're the same size. (But interestingly, unsigned int
is compatible with int
)
It's nothing to do with being the same size, or using the same register-set as suggested in a comment, it's mainly a way to let compilers assume that different pointers don't point to overlapping memory when optimizing. They still have to support C99 union
type-punning, not just memcpy
. So for example a dst[i] = src[i]
loop doesn't need to check for possible overlap when unrolling or vectorizing, if dst and src have different types.1
If you're accessing the same integer data, the standard requires that you use the exact same type, modulo only things like signed
vs. unsigned
and const
. Or that you use (unsigned) char*
, which is like GNU C __attribute__((may_alias))
.
The other part of your question seems to be why it appears to work in practice, despite the UB.
Your godbolt link forgot to link the actual compilers you tried.
https://godbolt.org/z/rvj3d4e4o shows GCC4.1, from before GCC went out of its way to support "obvious" local compile-time-visible cases like this, to sometimes not break people's buggy code using non-portable idioms like this.
It loads garbage from stack memory, unless you use -fno-strict-aliasing
to make it movd
to that location first. (Store/reload instead of movd %xmm0, %eax
is a missed-optimization bug that's been fixed in later GCC versions for most cases.)
f: # GCC4.1 -O3
movl -4(%rsp), %eax
ret
f: # GCC4.1 -O3 -fno-strict-aliasing
movss %xmm0, -4(%rsp)
movl -4(%rsp), %eax
ret
Even that old GCC version warns warning: dereferencing type-punned pointer will break strict-aliasing rules
which should make it obvious that GCC notices this and does not consider it well-defined. Later GCC that do choose to support this code still warn.
It's debatable whether it's better to sometimes work in simple cases, but break other times, vs. always failing. But given that GCC -Wall
does still warn about it, that's probably a good tradeoff between convenience for people dealing with legacy code or porting from MSVC. Another option would be to always break it unless people use -fno-strict-aliasing
, which they should if dealing with codebases that depend on this behaviour.
Being UB doesn't mean required-to-fail
Just the opposite; it would take tons of extra work to actually trap on every signed overflow in the C abstract machine, for example, especially when optimizing stuff like 2 + c - 3
into c - 1
. That's what gcc -fsanitize=undefined
tries to do, adding x86 jo
instructions after additions (except it still does constant-propagation so it's just adding -1
, not detecting temporary overflow on INT_MAX. https://godbolt.org/z/WM9jGT3ac). And it seems strict-aliasing is not one of the kinds of UB it tries to detect at run time.
See also the clang blog article: What Every C Programmer Should Know About Undefined Behavior
An implementation is free to define behaviour the ISO C standard leaves undefined
For example, MSVC always defines this aliasing behaviour, like GCC/clang/ICC do with -fno-strict-aliasing
. Of course, that doesn't change the fact that pure ISO C leaves it undefined.
It just means that on those specific C implementations, the code is guaranteed to work the way you want, rather than happening to do so by chance or by de-facto compiler behaviour if it's simple enough for modern GCC to recognize and do the more "friendly" thing.
Just like gcc -fwrapv
for signed-integer overflows.
Footnote 1: example of strict-aliasing helping code-gen
#define QUALIFIER // restrict
void convert(float *QUALIFIER pf, const int *pi) {
for(int i=0 ; i<10240 ; i++){
pf[i] = pi[i];
}
}
Godbolt shows that with the -O3
defaults for GCC11.2 for x86-64, we get just a SIMD loop with movdqu
/ cvtdq2ps
/ movups
and loop overhead. With -O3 -fno-strict-aliasing
, we get two versions of the loop, and an overlap check to see if we can run the scalar or the SIMD version.
Is there actual cases where strict aliasing helps better code generation, in which the same cannot be achieved with restrict
You might well have a pointer that might point into either of two int
arrays, but definitely not at any float
variable, so you can't use restrict
on it. Strict-aliasing will let the compiler still avoid spill/reload of float
objects around stores through the pointer, even if the float
objects are global vars or otherwise aren't provably local to the function. (Escape analysis.)
Or a struct node *
that definitely isn't the same type as the payload in a tree.
Also, most code doesn't use restrict
all over the place. It could get quite cumbersome. Not just in loops, but in every function that deals with pointers to structs. And if you get it wrong and promise something that's not true, your code's broken.