Here's my code:
int f(double x, double y)
{
return std::isnan(x) || std::isnan(y);
}
If you're using C instead of C++, just replace std::
with __builtin_
(don't simply remove std::
, for reasons shown here: Why does GCC implement isnan() more efficiently for C++ <cmath> than C <math.h>?).
Here's the assembly:
ucomisd %xmm0, %xmm0 ; set parity flag if x is NAN
setp %dl ; copy parity flag to %edx
ucomisd %xmm1, %xmm1 ; set parity flag if y is NAN
setp %al ; copy parity flag to %eax
orl %edx, %eax ; OR one byte of each result into a full-width register
Now let's try an alternative formulation that does the same thing:
int f(double x, double y)
{
return std::isunordered(x, y);
}
Here's the assembly for the alternative:
xorl %eax, %eax
ucomisd %xmm1, %xmm0
setp %al
This is great--we cut the generated code almost in half! This works because ucomisd
sets the parity flag if either of its operands is NAN, so we can test two values at a time, SIMD-style.
You can see code like the original version in the wild, for example: https://svn.r-project.org/R/trunk/src/nmath/qnorm.c
If we could make GCC smart enough to combine two isnan()
calls everywhere, that would be pretty cool. My question is: can we, and how? I have some idea of how compilers work, but I don't know where in GCC this sort of optimization could be performed. The basic idea is whenever there is a pair of isnan()
(or __builtin_isnan
) calls OR'd together, it should emit a single ucomisd
instruction using the two operands at the same time.
Edited to add some research prompted by Basile Starynkevitch's answer:
If I compile with -fdump-tree-all, I find two files which seem relevant. First, *.gimple
contains this (and a bit more):
D.2229 = x unord x;
D.2230 = y unord y;
D.2231 = D.2229 | D.2230;
Here we can clearly see that GCC knows it will pass (x, x)
to isunordered()
. If we want to optimize by transforming at this level, the rule would be roughly: "Replace a unord a | b unord b
with a unord b
." This is what you get when compiling my second C code:
D.2229 = x unord y;
Another interesting file is *.original
:
return <retval> = (int) (x unord x || y unord y);
That's actually the entire non-comment file generated by -fdump-tree-original
. And for the better source code it looks like this:
return <retval> = x unord y;
Clearly the same sort of transformation can be applied (just here it's ||
instead of |
).
But unfortunately if we modify the source code to e.g.:
if (__builtin_isnan(x))
return true;
if (__builtin_isnan(y))
return true;
return false;
Then we get quite different Gimple and Original output files, though the final assembly is the same as before. So maybe it's better to attempt this transformation at a later stage in the pipeline? The *.optimized
file (among others) shows the same code for the version with "if"s as for the original version, so that's promising.