Using godbolt.org x86-64 gcc 11.2
, This code...
typedef int v4i __attribute__ ((vector_size (16)));
typedef union {
v4i v;
} int4;
int4 mul(int4 l, int4 r)
{
return (int4){.v=l.v * r.v};
}
...produces this assembly (when compiled with -O3 -mavx
)...
mul:
vpmulld xmm0, xmm0, xmm1
ret
However this code...
typedef int v4i __attribute__ ((vector_size (16)));
typedef union {
v4i v;
struct {int x,y,z,w;}; // this line is the change
int i[4]; // this one too
} int4;
int4 mul(int4 l, int4 r)
{
return (int4){.v=l.v * r.v};
}
...produces this assembly (when also compiled with -O3 -mavx
)...
mul:
mov QWORD PTR [rsp-40], rdi
mov QWORD PTR [rsp-32], rsi
vmovdqa xmm1, XMMWORD PTR [rsp-40]
mov QWORD PTR [rsp-24], rdx
mov QWORD PTR [rsp-16], rcx
vpmulld xmm0, xmm1, XMMWORD PTR [rsp-24]
vmovdqa XMMWORD PTR [rsp-40], xmm0
mov rax, QWORD PTR [rsp-40]
mov rdx, QWORD PTR [rsp-32]
ret
x86-64 clang 13.0.1
has similar results
So my question is, how can I convince gcc (and/or clang) that these 2 blocks of code should produce the same output?
I've tried __attribute__ ((aligned))
, removing the int i[4];
or the struct
, applying __attribute__ ((packed))
to the struct
, I even gave __attribute__ ((transparent_union))
a go. Whatever magic status __attribute__ ((vector_size (16)))
bestows is broken by adding anything to the union
.