0

I'm using the following to extract the sign bit of an __mm128:

const int sign_mask = _mm_movemask_ps(a);

I now want to use the following to blend two vectors:

v_add = _mm_blendv_ps(a, v_add_neg, _mm_castsi128_ps(v_mask));

v_mask needs to come from sign_mask but I cannot find an intrinsic that does this.

The code's purpose is to change the signs of a vector's elements based on the signs in another vector's corresponding elements.

Paul R
  • 208,748
  • 37
  • 389
  • 560
IamIC
  • 17,747
  • 20
  • 91
  • 154
  • You could store all the possible v_masks in an array, indexed by the sign_mask. But I don't see why you want to go through sign_mask, you should stick to vectors. – Marc Glisse Apr 26 '18 at 12:38
  • I subsequently figured it's easier to use vector AND and OR to do this. There isn't an intrinsic (that I know of) that extracts the sign into a vector. – IamIC Apr 26 '18 at 12:42
  • 1
    I'm not sure if you need such an intrinsic, because `_mm_blendv_ps (a, b,c)` uses the sign bits of `c` to choose between the elements of `a` and `b`. Probably that is just want you want? – wim Apr 26 '18 at 15:23

1 Answers1

5

You could use _mm_blendv_ps(a, v_add_neg, a). blendvps takes a vector input, and uses the sign bit of each element as the blend condition for that element.

You only need movemask if you need it as an integer, not a vector, e.g. to use it as an index for a lookup table, or to branch on all vector elements having some property for example.

The code's purpose is to change the signs of a vector's elements based on the signs in another vector's corresponding elements.

Use booleans to manipulate sign bits:

 // pick your favourite way to express a 0x80000000 FP constant: just the sign bit set.
__m128  sign_v = _mm_and_ps(v, _mm_set1_ps(-0.0));
__m128  a_times_sign_v = _mm_xor_ps(a, sign_v);

The flips the sign of elements in a where v had its sign bit set.

Note that it treats -0.0 as negative, not zero, and -NaN is also treated as a normal negative. If you don't want that, use _mm_cmplt_ps and left-shift or AND that compare-mask to get a sign-bit mask for xorps.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/169913/discussion-on-answer-by-peter-cordes-avx-sse-convert-floating-point-sign-mask-to). – Bhargav Rao Apr 27 '18 at 06:26
  • Out of curiosity, what library do you use for Log and Exp on pd's? – IamIC May 30 '18 at 03:30
  • 1
    @IamIC: last time I needed it, I was using Agner Fog's Vector Class Library (http://agner.org/optimize/), which is GPLed. But I needed a fast approximation to Log() for packed `float`, so I did some research and made a very fast version that was accurate enough for my purposes (as part of `asinh`). See [Efficient implementation of log2(\_\_m256d) in AVX2](https://stackoverflow.com/a/45787548) for an overview of what I used (JRF's polynomial of order 6, with AVX2 FMA). I haven't had a need for packed-`double` or an accurate Log that's anywhere near 1 or 0.5 ulp precision. – Peter Cordes May 30 '18 at 03:44
  • Thank you, Peter. You're amazingly helpful as always. – IamIC May 30 '18 at 03:45
  • (Update on my previous comment, VCL is now licensed with an Apache license, not GPL) – Peter Cordes Jun 25 '22 at 04:49