0

My goal is to vectorize comparisons to use them as a masks in the future.

The problem is that _mm256_cmp_pd returns NaN instead of 1.0. What is the correct way to do comparisons in AVX2?

AVX2 code:

__m256d _numberToCompare = _mm256_set1_pd(1.0);
__m256d _compareConditions = _mm256_set_pd(0.0, 1.0, 2.0, 3.0);

__m256d _result = _mm256_cmp_pd(_numberToCompare, _compareConditions, _CMP_LT_OQ); //a < b ordered (non-signalling) 
alignas(8) double res[4];
_mm256_store_pd(&res[0], _result);
for (auto i : res) {
    std::cout << i << '\t';
}
   
__m256d _result2 = _mm256_cmp_pd(_numberToCompare, _compareConditions, _CMP_LE_OQ); //a <= b ordered (non-signalling)   
alignas(8) double res2[4];
_mm256_store_pd(&res2[0], _result2);
for (auto i : res2) {
    std::cout << i << '\t';
}
std::cout << '\n';

GodBolt link

Expected result (one I would have in scalar code):

0 0 1 1
0 1 1 1

Actual result:

-nan    -nan    0       0
-nan    -nan    -nan    0
  1. Why result of comparison is NaN?
  2. What is the correct way to get expected result?
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Vladislav Kogan
  • 561
  • 6
  • 15
  • 2
    Near duplicate of [SSE comparison returns vector of NANs](https://stackoverflow.com/q/57091959). For examples of using compare results, see things like [Can I make C++ generate cmpps instruction without inline assembly?](https://stackoverflow.com/q/35693277), or [Intel intrinsics: vector comparison result to array of bool conversion](https://stackoverflow.com/q/72735770) showing integer manipulation of the cmp result. Or [SIMD instructions for floating point equality comparison (with NaN == NaN)](https://stackoverflow.com/q/34951714) (in asm, but intrinsics exist for the insns.) – Peter Cordes Nov 15 '22 at 05:27
  • 2
    Another good example: [How do I get the sign of an intel Architecture SIMD \_\_m128](https://stackoverflow.com/a/48364855) - the question wanted a float with values -1.0 / 0 / +1.0 , which was an X-Y problem. My answer guessed that they wanted to multiply another value by that, but you can apply that more easily using bitwise ops directly. Negating a float is just a matter of XORing the sign bit with 1 (i.e. xor with `-0.0`) – Peter Cordes Nov 15 '22 at 06:06
  • Thanks. However I'm not thinking it's a duplicate. Yes, SSE/AVX are all SIMD, but yet AVX2 is different instruction set expansion. Answer in AVX2 syntaxis would be more clear. I think AVX2 deserve a separate Q-A page (and tried to made it). – Vladislav Kogan Nov 18 '22 at 02:20
  • 1
    Yeah, none weren't quite similar enough to close as a duplicate, but the concept is the same from SSE4.1 (`blendvpd`) through AVX2. `_mm` vs. `_mm256` is a trivial difference vs. understanding the concept of bitwise masking to zero out an element or not, or using a blend to select between two values. Yes, your self-answered Q&A ([How to do mask / conditional / branchless arithmetic operations in AVX2](https://stackoverflow.com/q/74454057)) is hopefully useful to future readers for showing that. – Peter Cordes Nov 18 '22 at 02:27

1 Answers1

3

Ad 1: The result is a bitmask (in binary 0xffff'ffff'ffff'ffff for true or 0 for false) which can be used with bitwise operators.

Ad 2: You can compute _result = _mm256_and_pd(_result, _mm256_set1_pd(1.0)) if you really want 1 and 0 (but usually, using the bitmask directly is more efficient).

Also be aware that _mm256_set_pd takes the arguments in "big-endian" order, i.e., the element with the highest address is the first argument (don't ask me why Intel decided that way) -- you can use _mm256_setr_pd instead if you prefer little-endian.

chtz
  • 17,329
  • 4
  • 26
  • 56