4

I would like to compare two vectors of doubles based on their absolute values.

That is, the vector equivalent of the following:

if (fabs(x) < fabs(y)) {
    ...
}

Is there anything better than just taking the absolute value of each side and following up with a _mm256_cmp_pd?

Interested in all of AVX, AVX2, and AVX-512 flavors.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • 2
    Probably the following idea works for AVX-512, but I don't have the hardware to figure out the details: Maybe you can use `y_sgnx=_mm512_mask_ternarylogic_epi64(x,y,z,c8)`, with `z=_mm512_set1_epi64(0x7FFFFFFFFFFFFFFFull)`, and a `c8` such that `y_sgnx` is the same as `y`, but with the sign bit of `x`. Cast `x` and `y_sgnx` from double to `epi64` (`_mm512_castpd_si512`). Now you can use an unsigned integer compare with `x` and `y_sgnx` (`_mm512_cmp_epu64_mask`) to get the right mask. See also [here](http://0x80.pl/articles/avx512-ternary-functions.html#bit-select-function-update) – wim Jul 10 '20 at 23:33
  • Another `ternarylogic` idea would be to integer-subtract `x` from `y` then check if the result sign bit equals the sign of `x ^ y` – chtz Jul 11 '20 at 00:15
  • 1
    Do you have *any* reason to expect the [obvious](https://godbolt.org/z/TzMxPK) `vandpd; vandpd; vcmpltpd` to be limiting your performance? – EOF Jul 11 '20 at 11:33
  • @EOF no, I don't. – BeeOnRope Jul 11 '20 at 18:43

1 Answers1

4

With AVX-512 you can save one µop. Instead of 2xvandpd+vcmppd you can use vpternlogq+vpcmpuq. Note that the solution below assumes that the numbers are not a NaN.

IEEE-754 floating point numbers have the nice property that they are encoded such that if x[62:0] integer_less_than y[62:0], then as a floating point: abs(x)<abs(y).

So, instead of setting both sign bits to 0, we can copy the sign bit of x to the sign bit of y and compare the result as an unsigned integer. In the (untested) code below, for negative x both xi[63] and yi_sgnx[63] are 1, while for positive x, both xi[63] and yi_sgnx[63] are 0. So the unsigned integer compare actually compares xi[62:0] with yi[62:0], which is just what we need for the comparison abs(x)<abs(y).

The vpternlog instruction is suitable for copying the sign bit, see here or here. I'm not sure if the constants z and 0xCA are chosen correctly.

__mmask8 cmplt_via_ternlog(__m512d x, __m512d y){
    __m512i xi        = _mm512_castpd_si512(x);                                       
    __m512i yi        = _mm512_castpd_si512(x);                                       
    __m512i z         = _mm512_set1_epi64(0x7FFFFFFFFFFFFFFFull);
    __m512i yi_sgnx   = _mm512_ternarylogic_epi64(z, yi, xi, 0xCA);
    return _mm512_cmp_epu64_mask(xi, yi_sgnx, 1);   /* _CMPINT_LT  */
}
wim
  • 3,702
  • 19
  • 23