C simd _m128 fabs

Question

How to make fabs() for __m128 vector ?

Does I have to use sign bits to multiply the original vector by 1.0f/-1.0f ?

Didn't find any instruction set to do it.

I don't want __m256 or 512. I'm searching for __m128 data type

score 5 · Accepted Answer · answered Aug 13 '23 at 21:12

5

A single AND-NOT using a prepared constant to turn off the sign bits suffices:

#include <stdio.h>

#include <xmmintrin.h>


int main(void)
{
    static const __m128 MinusZero = { -0.f, -0.f, -0.f, -0.f };

    __m128 x = { +1, -2, +3, -4 };

    __m128 y = _mm_andnot_ps(MinusZero, x);

    printf("%g %g %g %g\n", y[0], y[1], y[2], y[3]);
}

Output:

1 2 3 4

answered Aug 13 '23 at 21:12

Eric Postpischil

195,579
13
168
312

And see an earlier duplicate ([Fastest way to compute absolute value using SSE](https://stackoverflow.com/q/32408665)) for my over-complicated attempt to get the compiler to generate the constant on the fly with `pcmpeqd xmm1, xmm1` / `psrld xmm0, 1` instead of loading it from memory. I should really update that answer. – Peter Cordes Aug 14 '23 at 00:41
Also note that indexing a `__m128` like `y[0]` is a GNU extension, or a consequence of how GCC/Clang define it in terms of GNU C vector extensions. That won't compile with MSVC; see [print a \_\_m128i variable](https://stackoverflow.com/q/13257166) for portable ways to access vector elements. (The question isn't about accessing vector elements individually so it's fine to keep the example compact example, as long as we warn future readers that this is non-standard syntax they shouldn't use in portable code.) – Peter Cordes Aug 14 '23 at 00:42

score 0 · Answer 2 · answered Aug 13 '23 at 20:47

0

I have find a persone proposing this, in a related post, but if you have better, please propose it. I'm new to simd.

__m128 _m128_fabs(__m128 x)
{
    __m128 minus_zero = _mm_set1_ps(-0.0);  // epi32(1U<<31)
    __m128 signbits = _mm_and_ps(x, minus_zero);
    __m128 flipped = _mm_xor_ps(x, signbits);

    // reuse the zero constant we already have, maybe saving an instruction
    __m128 nonzero = _mm_cmpneq_ps(x, minus_zero);
    return _mm_and_ps(flipped, nonzero);
}

answered Aug 13 '23 at 20:47

K V

61
8

3

Why are you doing anything more than clearing the sign bits with a single AND (with bits 0x7FFFFFFFF in each element) or a single AND-NOT (with bits 0x80000000 in each element)? – Eric Postpischil Aug 13 '23 at 21:04
1

@EricPostpischil i'm just new to it, so I didn't knew how to do it. – K V Aug 13 '23 at 21:48
2

When you copy someone's code, you should credit the person by at least linking their post. In this case, [How do I get the sign of an intel Architecture SIMD \_\_m128](https://stackoverflow.com/a/48364855) where `__m128 mul_by_signum(__m128 v, __m128 src)` in my answer takes two inputs, so we're either sign-flipping or zeroing a *different* vector based on the values in another, which is a non-trivial problem, unlike clearing the sign bit. – Peter Cordes Aug 15 '23 at 06:05

Shawn · Answer 3 · 2023-08-13T21:15:02.200

0

One way using SSE4.1 instructions:

__m128 _mm_fabs_ps(__m128 vec) {
  return _mm_blendv_ps(vec, _mm_mul_ps(vec, _mm_set_ps1(-1.0)),
                       _mm_cmplt_ps(vec, _mm_set_ps1(0.0)));
}

Not sure how it compares performance-wise to the other answer.

Basic idea is to pick either the positive number or the negative number multiplied by -1.0.

edited Aug 13 '23 at 21:15

answered Aug 13 '23 at 21:07

Shawn

47,241
3
26
60

Performance-wise, this is total garbage vs. an AND to mask away the sign bits. It still needs a vector constant, but involves 3 instructions instead of 1, one of which runs as multiple uops on most Intel CPUs, and the other two have 3 or 4 cycle latency (in parallel with each other). https://uops.info/ . If you wanted to avoid a vector constant, you might subtract from `0.0` (since all-zero can be created more cheaply), but that would still be terrible. – Peter Cordes Aug 14 '23 at 00:34
See [Fastest way to compute absolute value using SSE](https://stackoverflow.com/q/32408665) (which I should really update to be less weird about trying to convince the compiler to generate the constant on the fly, at least not at the top of the answer.) – Peter Cordes Aug 14 '23 at 00:37

C simd _m128 fabs

3 Answers3

Linked