How to make fabs() for __m128 vector ?
Does I have to use sign bits to multiply the original vector by 1.0f/-1.0f ?
Didn't find any instruction set to do it.
I don't want __m256 or 512. I'm searching for __m128 data type
How to make fabs() for __m128 vector ?
Does I have to use sign bits to multiply the original vector by 1.0f/-1.0f ?
Didn't find any instruction set to do it.
I don't want __m256 or 512. I'm searching for __m128 data type
A single AND-NOT using a prepared constant to turn off the sign bits suffices:
#include <stdio.h>
#include <xmmintrin.h>
int main(void)
{
static const __m128 MinusZero = { -0.f, -0.f, -0.f, -0.f };
__m128 x = { +1, -2, +3, -4 };
__m128 y = _mm_andnot_ps(MinusZero, x);
printf("%g %g %g %g\n", y[0], y[1], y[2], y[3]);
}
Output:
1 2 3 4
I have find a persone proposing this, in a related post, but if you have better, please propose it. I'm new to simd.
__m128 _m128_fabs(__m128 x)
{
__m128 minus_zero = _mm_set1_ps(-0.0); // epi32(1U<<31)
__m128 signbits = _mm_and_ps(x, minus_zero);
__m128 flipped = _mm_xor_ps(x, signbits);
// reuse the zero constant we already have, maybe saving an instruction
__m128 nonzero = _mm_cmpneq_ps(x, minus_zero);
return _mm_and_ps(flipped, nonzero);
}
One way using SSE4.1 instructions:
__m128 _mm_fabs_ps(__m128 vec) {
return _mm_blendv_ps(vec, _mm_mul_ps(vec, _mm_set_ps1(-1.0)),
_mm_cmplt_ps(vec, _mm_set_ps1(0.0)));
}
Not sure how it compares performance-wise to the other answer.
Basic idea is to pick either the positive number or the negative number multiplied by -1.0.