How to compare __m128 types?

Question

__m128 a;
__m128 b;

How to code a != b ?

what to use: _mm_cmpneq_ps or _mm_cmpneq_ss ?

How to process the result ?

Can't find adequate docs.

I hope you understand why it's not a good idea to compare floating point values for equality/inequality ? (This applies to both scalar code and SIMD code.) — Paul R, May 18 '11 at 09:23
Docs are available from Intel and AMD. Look for processor manuals. — Dietrich Epp, May 18 '11 at 09:24

Paul R · Accepted Answer · 2011-05-18T10:05:56.757

18

You should probably use _mm_cmpneq_ps. However the interpretation of comparisons is a little different with SIMD code than with scalar code. Do you want to test for any corresponding element not being equal ? Or all corresponding elements not being equal ?

To test the results of the 4 comparisons from _mm_cmpneq_ps you can use _mm_movemask_epi8.

Note that comparing floating point values for equality or inequality is usually a bad idea, except in very specific cases.

__m128i vcmp = (__m128i)_mm_cmpneq_ps(a, b); // compare a, b for inequality
uint16_t test = _mm_movemask_epi8(vcmp); // extract results of comparison
if (test == 0xffff)
    // *all* elements not equal
else if (test != 0)
    // *some* elements not equal
else
    // no elements not equal, i.e. all elements equal

For documentation you want these two volumes from Intel:

Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2A: Instruction Set Reference, A-M

Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2B: Instruction Set Reference, N-Z

edited May 18 '11 at 10:05

answered May 18 '11 at 09:26

Paul R

208,748
37
389
560

1

BTW, in Visual Studio the regular C++ casts don't work, but you can still cast to __m128i using the intrinsic _mm_castps_si128 – Virgil Jun 10 '11 at 14:03
@Virgil: yes, this is just one of several problem areas with the Visual Studio C/C++ compilers and SSE code - there are some nasty (and seemingly arbitrary) ABI restrictions too. I recommend using gcc or better still Intel's ICC, and avoid Windows if humanly possible. ;-) – Paul R Jun 10 '11 at 14:40
1

You'd normally want `_mm_movemask_ps` here, to get a simple 4-bit mask in the low bits of a 32-bit integer. – Peter Cordes Feb 27 '19 at 04:56

Pixelchemist · Answer 2 · 2017-02-08T15:54:42.940

The answer to this question also depends on whether you want actual inequality where you'd use something along the lines of what @PaulR has shown:

bool fneq128_a (__m128 const& a, __m128 const& b)
{
    // returns true if at least one element in a is not equal to 
    // the corresponding element in b
    return _mm_movemask_ps(_mm_cmpeq_ps(a, b)) != 0xF;
}

or whether you want to use some epsilon to specify that elements are still considered to be "equal" if they do not differ more than the threshold:

bool fneq128_b (__m128 const& a, __m128 const& b, float epsilon = 1.e-8f)
{
    // epsilon vector
    auto eps = _mm_set1_ps(epsilon);
    // absolute of difference of a and b
    auto abd = _mm_andnot_ps(_mm_set1_ps(-0.0f), _mm_sub_ps(a, b));
    // compare abd to eps
    // returns true if one of the elements in abd is not less than 
    // epsilon
    return _mm_movemask_ps(_mm_cmplt_ps(abd, eps)) != 0xF;
}

Example:

auto a = _mm_set_ps(0.0, 0.0, 0.0, 0.0);
auto b = _mm_set_ps(0.0, 0.0, 0.0, 1.e-15);
std::cout << fneq128_a(a, b) << ' ' << fneq128_b(a, b) << "\n";

Prints:

1 0

Dick Bertrand · Answer 3 · 2020-11-01T18:54:51.860

-1

Peter is right!!! Tests against values that are 0.0f can fail under the previous approach.

Please consider this MACRO. #define ISEQUAL(A, B) _mm_testz_si128(_mm_xor_si128(_mm_castps_si128(A), _mm_castps_si128(B)),
_mm_xor_si128(_mm_castps_si128(A), _mm_castps_si128(B)))

This results in 2 instructions.

edited Nov 01 '20 at 18:54

answered Apr 01 '17 at 21:03

Dick Bertrand

91
3

1

That's not a test for equality, that's testing if `~a & b == all-zero` (Across all 128 bits). See also [Can PTEST be used to test if two registers are both zero or some other condition?](//stackoverflow.com/q/43712243) – Peter Cordes Feb 27 '19 at 04:59
Making it a macro is inconvenient; you might as well make it an inline function so you can use a temporary to hold the bitwise XOR result. But yes, this should work if you want to test for bitwise equality, considering NaN == NaN (with the same payload), and `-0.0 != 0.0`. Instead of IEEE floating-point equality rules. – Peter Cordes Nov 02 '20 at 04:49

How to compare __m128 types?

3 Answers3

Linked