I am writing software that needs to compare two _mm256 vectors for equality. However, I would like there to be a margin of error +/- 0.00001. Eg, 3.00001 should be considered equal to 3.00002. Is there a simple way to do this using SSE/AVX/AVX2 instructions?
2 Answers
Subtract values being compared, mask out sign with *_and_*
or *_andnot_
*, compare with *_cmpgt_*
against your margin populated with *_set1_*

- 12,039
- 2
- 34
- 79
-
I implemented this suggestion and it worked. However, I wonder if using a bit mask to mask out the sign bit is the best move. Although I suppose all the Intel/AMD processors that use these AVX instructions will also use the IEEE floating point format – RandomEagle Aug 27 '23 at 21:49
-
@RandomEagle: Yes, `[v]andps` is the most efficient way to implement `abs()`. [C simd \_m128 fabs](https://stackoverflow.com/q/76895133). As you say, SSE/AVX math instructions like `subps` and `cmpltps` definitely use IEEE binary32 floating point, and in general C implementations that support Intel intrinsics will use that type as their `float`. – Peter Cordes Aug 28 '23 at 03:35
-
3You could multiply the difference with `2.0/tolerance` and `ptest` (`_mm256_testz_si256`) that against `2.0`. If `ptest` set the `z` flag, every difference is below the tolerance. (Saves one µop, if you wanted to `ptest` the `cmpgt` result anyways). And if you knew one of the values ahead of time, the subtraction and multiplication could be fused to an FMA. – chtz Aug 28 '23 at 09:04
-
@chtz: You generally *don't* want to `ptest` a `cmpgtps` result if you're branching on it. `movmskps eax, xmm0` / `test eax,eax` is also 2 uops like `ptest`, but the `test` can macro-fuse with a JCC. It's break-even for uops and saves code-size if you're using integer `cmovcc` or `setcc` on the FLAGS result, though. Still, cool trick to check bits of the biased exponent field. – Peter Cordes Aug 31 '23 at 08:02
You might also look carefully at why we need this epsilon in the first place. (Very) skilled floating-point practitioners look at it as a precise language rather than an approximate one, and craft their algorithm with intent. In such cases, epsilon may not be needed. Either their answer is correctly rounded or nearly so, or they have done enough error analysis to know what to expect. Without that, the programmer is often confronted with the problem that he isn’t sure what the right value should really be for epsilon. Let hacks ensue! Epsilon is the floating-point analogy to C programmers disease. In casual hands, it is not what one would call engineering.
Obviously, there are some bulk arithmetic problems like matrix multiplication where this is unrealistic and error due to blocking, fma/not fma, operations order, etc. is expected and unavoidable — at least not without substantial performance cost. Other times, there is excessive error because the author has not been careful to avoid issues like catastrophic cancellation, and some attentive bug fixing will solve the problem. I’ll leave it up to you to decide which situation you are in.
Finally, if you are going to toss in an epsilon, I will add from some experience writing things like the OpenCL conformance test used by dozens of companies on a wide variety of hardware, that a simple epsilon isn’t good enough. In most cases, you really should be thinking in terms of relative error - that is, ULPs - in which case you are looking for
|actual - expected| < |expected * FLT_EPSILON * ulps|
Which is how the error is usually really shaped. However, that doesn’t cover all of it because of catastrophic cancellation, there will also be an absolute error threshold around maximum_realistic_result * FLT_EPSILON, and another one near FLT_MIN to cover excessive loss of precision due to abrupt underflow (GPUs) or truncation around FLT_MIN * FLT_EPSILON for gradual underflow and excessive loss of precision there associated, depending on whether the error is caused by software or hardware.
There is no substitute for doing the math.

- 1,592
- 9
- 16
-
I would add that finding the error in ULPs is actually really easy, supposing two numbers are finite and positive, and in the same exponent range; after removing Infinities and NaNs, you can just subtract the numbers as 32-bit integers (64-bit for doubles) and take the absolute value. – Ovinus Real Sep 01 '23 at 17:31