5

We permute a vector in a few places, and we need the distinguished 0 value to use with the vec_perm built-in. We have not been able to locate a vec_zero() or similar, so we would like to know how we should handle things.

The code currently use two strategies. The first strategy is a vector load:

__attribute__((aligned(16)))
static const uint8_t z[16] =
    { 0,0,0,0,  0,0,0,0,  0,0,0,0,  0,0,0,0 };

const uint8x16_p8 zero = vec_ld(0, z);

The second strategy is an xor using the mask we intend to use:

__attribute__((aligned(16)))
static const uint8_t m[16] =
    { 15,14,13,12,  11,10,9,8,  7,6,5,4, 3,2,1,0 };

const uint8x16_p8 mask = vec_ld(0, m);
const uint8x16_p8 zero = vec_xor(mask, mask);

We have not started benchmarks (yet), so we don't know if one is better than the other. The first strategy uses a VMX load and it could be expensive. The second strategy avoids the load but introduces a data dependency.

How do we obtain a VSX value of zero?

jww
  • 97,681
  • 90
  • 411
  • 885
  • If you just initialise the vector variable to zero in the usual way (note that there are two different syntaxes for this, depending on which compiler you are using) then the compiler will typically choose whichever method is more efficient to splat zero to the vector. – Paul R Sep 10 '17 at 07:11
  • 1
    You can of course just use the immediate form of vec_splat for zero and other small values. – Paul R Sep 10 '17 at 07:16

1 Answers1

2

I'd suggest to let the compiler handle it for you. Just initialise to zero:

const uint8x16_p8 zero = {0};

- which will likely compile to an xor.

For example, a simple test:

vector char foo(void)
{
    const vector char zero = {0};
    return zero;
}

On my machine, this compiles to:

0000000000000000 <foo>:
   0:   d7 14 42 f0     xxlxor  vs34,vs34,vs34
   4:   20 00 80 4e     blr
    ...
Jeremy Kerr
  • 1,895
  • 12
  • 24