2

I have some SIMD code that's checking for equality between vars but I'm getting different results between GCC and clang when NaNs are involved:

bool equal(__m128 a, __m128 b){
    return _mm_comieq_ss(a,b) == 1;
}

int main()
{
    __m128 a, b, c;

    a = _mm_set_ss(std::numeric_limits<float>::quiet_NaN());
    b = _mm_set_ss(1.0f);
    c = _mm_set_ss(1.0f);
    
    std::cout << "comieq(a,b):" << equal(a,b) << std::endl;
    std::cout << "comieq(b,a):" << equal(b,a) << std::endl;
    std::cout << "comieq(b,c):" << equal(b,c) << std::endl;
    std::cout << "comieq(a,a):" << equal(a,a) << std::endl;

    return 0;
}

Clang and GCC return different values:

gcc:
comieq(a,b):1
comieq(b,a):1
comieq(b,c):1
comieq(a,a):1

clang:
comieq(a,b):0
comieq(b,a):0
comieq(b,c):1
comieq(a,a):0

Does anyone have an idea why this is happening? I just want to check if two regs are equal or not; is there an alternative way to do this that's consistent?

godbolt: https://godbolt.org/z/ETKenE45f

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
Biggy Smith
  • 910
  • 7
  • 14
  • 1
    @AdrianMole: Intel's documentation for the intrinsic specifies the behaviour that clang implements: act like an IEEE `==` compare, false if the comparison is unordered. (https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=MMX,SSE_ALL,AVX_ALL,Other&ig_expand=6243,519,365,365,164,24,1424,1424&text=comiss). But GCC's asm only checks the ZF result of `comiss` / `ucomiss`, ignoring PF, so it ends up treating unordered as equal. (https://www.felixcloutier.com/x86/comiss). This is a GCC bug, other than intentional bug-compatibility with old compilers, not current docs. – Peter Cordes Mar 23 '23 at 03:09
  • 1
    @AdrianMole: Your answer seems complete enough, now, with that quote from the LLVM devs explaining why GCC still gets it wrong, because apparently ICC also did historically. I didn't know that history when I first commented, and you've edited to include my update, so it also covers why Clang changed to get it right. BTW, a more efficient implementation in asm would be `cmpeqss xmm,xmm` / `movd eax, xmm` instead of integer `setcc` on both ZF and PF. (Except with `-ffast-math`, then `comiss` / checking ZF wins.) – Peter Cordes Mar 23 '23 at 03:31

1 Answers1

3

The different handling of return values when comparing NaN values was specifically changed in Clang 3.9.0. Related Link.

Although one would expect intrinsic functions to be just that – intrinsic to the CPU and not compiler-dependent, the comiss instruction produces a result in multiple bits of FLAGS. Different intrinsics check different predicates to define a single boolean return value; in asm it would be up to the programmer to use a combination of instructions like je, jb, and/or jp or setcc / cmovcc to use the compare result.

What is happening here is that GCC is checking only the ZF (zero flag) value, whereas Clang is also (correctly) checking the PF ('parity' flag: set if the comparison is unordered, i.e. one of the inputs is NaN. This matches the way integer FLAGS were set by P6 x87 fcomi, in turn matching older x87 fcom / fstsw ax / sahf).

I shall offer a short quotation from the discussion linked above, which may shed some light on the reasoning behind the decision made by the LLVM (clang) team:

In Clang 3.8.0 and before, comparing two scalars of which at least one is a NaN would return 1. This is also the behavior that GCC, Visual Studio, and our current Emscripten code implements. This behavior is unintuitive in the sense that comparing NaNs in floats have the opposite tradition in IEEE-754, i.e. "nothing is equal to a NaN".

Intel is the original author of these intrinsics, and it must be admitted that these functions have long suffered from poor documentation. Intel doesn't spec in detail how these intrinsics should work with respect to NaNs (https://software.intel.com/en-us/node/514308), but presumably the reference implementation in their own compiler was held as the ground truth. The behavior that GCC, VS and Clang <= 3.8 each follow likely comes from adhering to the original code as implemented in Intel's compilers, where _mm_comieq_ss is implemented to perform the COMISS instruction and return the resulting zero flag (ZF) register state as the output int value of the intrinsic function. The COMISS instruction itself is though well documented since it's part of the ISA, and is shown e.g. at http://x86.renejeschke.de/html/file_module_x86_id_44.html. This shows the origin of the unexpected NaN behavior, since the zero flag is set if the comparison is equal, or if the comparison result is unordered, i.e. at least one of the registers is a NaN.


Following the comment from Peter Cordes, it is now clear that the (modified) clang behaviour is correct and the "poor documentation" from Intel referred to in the above citation has been corrected. The Intel documentation for _mm_comieq_ss now makes it clear that any NaN value present should yield a return value of zero:

Operation

RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • 2
    That comment is out of date; the current Intel intrinsics guide *does* detail the fact that NaNs make the `eq` intrinsic predicate false. `RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0`. https://godbolt.org/z/zTn8aj84v shows ICC 16.0.3 follow that documented behaviour, although ICC13 does only check ZF like they discuss in that 2016 comment. I commented on the LLVM issue with an update (https://github.com/emscripten-core/emscripten/issues/4435#issuecomment-1480546251) – Peter Cordes Mar 23 '23 at 03:21
  • 1
    found this on gcc bug tracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98612 says will be fixed after intrinsics guide updated, but not good round to it yet... – Biggy Smith Mar 23 '23 at 17:06
  • did a bit more searching, seems this is fixed in gcc 13+ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106113 godbolt: https://godbolt.org/z/qP891ceGG – Biggy Smith Mar 25 '23 at 23:06