TL:DR: it looks like POWER7 is the minimum requirement for 64-bit element size with AltiVec. This is part of VSX (Vector Scalar Extension), which Wikipedia confirms first appeared in POWER7.
It's very likely that gcc knows what it's doing, and enables 64-bit element-size vector intrinsics with the lowest necessary -mcpu=
requirement.
#include <altivec.h>
auto vec32(void) { // compiles with your options: Power4
return vec_splats((int) 1);
}
// gcc error: use of 'long long' in AltiVec types is invalid without -mvsx
vector long long vec64(void) {
return vec_splats((long long) 1);
}
(With auto
instead of vector long long
, the 2nd function compiles to returning in two 64-bit integer registers.)
Adding -mvsx
lets the 2nd function compile. Using -mcpu=power7
also works, but power6 doesn't.
source + asm on Godbolt (PowerPC64 gcc6.3)
# with auto without VSX:
vec64(): # -O3 -mcpu=power4 -maltivec -mregnames
li %r4,1
li %r3,1
blr
vec64(): # -O3 -mcpu=power7 -maltivec -mregnames
.LCF2:
0: addis 2,12,.TOC.-.LCF2@ha
addi 2,2,.TOC.-.LCF2@l
addis %r9,%r2,.LC0@toc@ha
addi %r9,%r9,.LC0@toc@l # PC-relative addressing for static constant, I think.
lxvd2x %vs34,0,%r9 # vector load?
xxpermdi %vs34,%vs34,%vs34,2
blr
.LC0: # in .rodata
.quad 1
.quad 1
And BTW, vec_splats
(splat scalar) with a constant compiles to a single instruction. But with a runtime variable (e.g. a function arg), it compiles to an integer store / vector load / vector-splat (like the vec_splat
intrinsic). Apparently there isn't a single instruction for int->vec.
The vec_splat_s32
and related intrinsics only accept a small (5-bit) constant, so they only compile in cases where the compiler can use the corresponding splat-immediate instruction.
This Intel SSE to PowerPC AltiVec migration looks mostly good, but got that wrong (it claims that vec_splats
splats a signed byte).