I believe 6.5p7 in the C standard defines the so-called strict aliasing rule as follows.
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type.
Here's a simple example that shows GCC's optimization based on its assumption to the rule.
int IF(int *i, float *f) {
*i = -1;
*f = 0;
return *i;
}
IF:
mov DWORD PTR [rdi], -1
mov eax, -1
mov DWORD PTR [rsi], 0x00000000
ret
The load for return *i
is omitted assuming that int
and float
cannot alias.
Then let's consider case 6, where it says an object could be accessed by a character type lvalue expression (char *
).
int IC(int *i, char *c) {
*i = -1;
*c = 0;
return *i;
}
IC:
mov DWORD PTR [rdi], -1
mov BYTE PTR [rsi], 0
mov eax, DWORD PTR [rdi]
ret
Now there is a load for return *i
because i
and c
could overlap according to the rules, and *c = 0
could change what's in *i
.
Then can we also modify a char
through an int *
? Should the compiler care that such thing might happen?
char CI(char *c, int *i) {
*c = -1;
*i = 0;
return *c;
}
CI: #GCC
mov BYTE PTR [rdi], -1
mov DWORD PTR [rsi], 0
movzx eax, BYTE PTR [rdi]
ret
CI: #Clang
mov byte ptr [rdi], -1
mov dword ptr [rsi], 0
mov al, byte ptr [rdi]
ret
Looking at the assembly output, both GCC and Clang seem to think a char
can be modified by access through int *
.
Maybe it's obvious that A
and B
overlapping means A
overlaps B
and B
overlaps A
. However, I found this detailed answer which emphasizes in boldface that,
Note that
may_alias
, like thechar*
aliasing rule, only goes one way: it is not guaranteed to be safe to useint32_t*
to read a__m256
. It might not even be safe to usefloat*
to read a__m256
. Just like it's not safe to dochar buf[1024]; int *p = (int*)buf;
.
Now I got really confused. The answer is also about GCC vector types, which has an may_alias
attribute so it can alias similarly as a char
.
At least, in the following example, GCC seems to think overlapping access can happen in both ways.
int IV(int *i, __m128i *v) {
*i = -1;
*v = _mm_setzero_si128();
return *i;
}
__m128i VI(int *i, __m128i *v) {
*v = _mm_set1_epi32(-1);
*i = 0;
return *v;
}
IV:
pxor xmm0, xmm0
mov DWORD PTR [rdi], -1
movaps XMMWORD PTR [rsi], xmm0
mov eax, DWORD PTR [rdi]
ret
VI:
pcmpeqd xmm0, xmm0
movaps XMMWORD PTR [rsi], xmm0
mov DWORD PTR [rdi], 0
movdqa xmm0, XMMWORD PTR [rsi]
ret
https://godbolt.org/z/ab5EMx3bb
But am I missing something? Is strict aliasing one-way?
Additionally, after reading the current answers and comments, I thought maybe this code is not allowed by the standard.
typedef struct {int i;} S;
S s;
int *p = (int *)&s;
*p = 1;
Note that (int *)&s
is different from &s.i
. My current interpretation is that an object of type S
is being accessed by an lvalue expression of type int
, and this case is not listed in 6.5p7.