What is the availability of 'vector long long'?

Question

I'm testing on an old PowerMac G5, which is a Power4 machine. The build is failing:

$ make
...
g++ -DNDEBUG -g2 -O3 -mcpu=power4 -maltivec -c ppc-simd.cpp
ppc-crypto.h:36: error: use of 'long long' in AltiVec types is invalid
make: *** [ppc-simd.o] Error 1

The failure is due to:

typedef __vector unsigned long long uint64x2_p8;

I'm having trouble determining when I should make the typedef available. With -mcpu=power4 -maltivec the machine reports 64-bit availability:

$ gcc -mcpu=power4 -maltivec -dM -E - </dev/null | sort | egrep -i -E 'power|ARCH'
#define _ARCH_PPC 1
#define _ARCH_PPC64 1
#define __POWERPC__ 1

The OpenPOWER | 6.1. Vector Data Types manual has a good information on vector data types, but it does not discuss when the vector long long are available.

What is the availability of __vector unsigned long long? When can I use the typedef?

Although the G5 had 64 bit *architecture*, AltiVec had no support for 64 bit vector elements back in those days, so no 64 bit ints and no double precision floats either. — Paul R, Dec 12 '17 at 12:25
Thanks Paul. We partition code around Power4 (Altivec), Power7 (unaligned loads/stores) and Power8 (in-core crypto). I guess my question is, do we need to use Power5 instead of Power4 for the 64-bit type? — jww, Dec 12 '17 at 12:35
I'm not sure which generation of POWER introduced 64 bit element types for SIMD - you'd need to do some research to find that out. Obviously you won't be able to test your code on a G5 though, if you need to use 64 bit elements. — Paul R, Dec 12 '17 at 12:38
Thanks again Paul. The research is not turning up too much. I can find partial patches to GCC but nothing definitive. Related, would you happen to know when `vec_splats` became available? I'm also catching *`error: 'vec_splats' was not declared in this scope`* on Power4. — jww, Dec 12 '17 at 13:54
I found some slides which referred to 2 x 64 bit SIMD in POWER 7, but I don't know if that is the earliest. As for `vec_splats` - in AltiVec this was qualified by the element size, so `vec_splat_s8`, `vec_splat_s16`, etc. It's possible that there may be a newer C++ interface which just uses an overloaded prototype `vec_splats` for all of these different intrinsics, but I haven't done much with POWER since the days of the G5 and AltiVec, other than a little benchmarking, so that's just a guess. — Paul R, Dec 12 '17 at 13:59
@PaulR: `vec_splats` also allows runtime variables, but the `vec_splat_s16` and so on only compile if they can use the splat-immediate single instruction. But yeah, it appears `vec_splats` is a good way to tell when gcc thinks 64-bit-element vectors are supported, and it's probably right. — Peter Cordes, Dec 26 '17 at 02:28

score 3 · Accepted Answer · answered Dec 26 '17 at 02:26

TL:DR: it looks like POWER7 is the minimum requirement for 64-bit element size with AltiVec. This is part of VSX (Vector Scalar Extension), which Wikipedia confirms first appeared in POWER7.

It's very likely that gcc knows what it's doing, and enables 64-bit element-size vector intrinsics with the lowest necessary -mcpu= requirement.

#include <altivec.h>

auto vec32(void) {       // compiles with your options: Power4
    return vec_splats((int) 1);
}

// gcc error: use of 'long long' in AltiVec types is invalid without -mvsx
vector long long vec64(void) {
    return vec_splats((long long) 1);
}

(With auto instead of vector long long, the 2nd function compiles to returning in two 64-bit integer registers.)

Adding -mvsx lets the 2nd function compile. Using -mcpu=power7 also works, but power6 doesn't.

source + asm on Godbolt (PowerPC64 gcc6.3)

# with auto without VSX:
vec64():     # -O3 -mcpu=power4 -maltivec -mregnames
    li %r4,1
    li %r3,1
    blr

vec64():  # -O3 -mcpu=power7 -maltivec -mregnames
.LCF2:
0:  addis 2,12,.TOC.-.LCF2@ha
    addi 2,2,.TOC.-.LCF2@l
    addis %r9,%r2,.LC0@toc@ha
    addi %r9,%r9,.LC0@toc@l       # PC-relative addressing for static constant, I think.
    lxvd2x %vs34,0,%r9            # vector load?
    xxpermdi %vs34,%vs34,%vs34,2
    blr


 .LC0:    # in .rodata
    .quad   1
    .quad   1

And BTW, vec_splats (splat scalar) with a constant compiles to a single instruction. But with a runtime variable (e.g. a function arg), it compiles to an integer store / vector load / vector-splat (like the vec_splat intrinsic). Apparently there isn't a single instruction for int->vec.

The vec_splat_s32 and related intrinsics only accept a small (5-bit) constant, so they only compile in cases where the compiler can use the corresponding splat-immediate instruction.

This Intel SSE to PowerPC AltiVec migration looks mostly good, but got that wrong (it claims that vec_splats splats a signed byte).

What is the availability of 'vector long long'?

1 Answers1

Linked