20

Here's my code:

int f(double x, double y)
{
  return std::isnan(x) || std::isnan(y);
}

If you're using C instead of C++, just replace std:: with __builtin_ (don't simply remove std::, for reasons shown here: Why does GCC implement isnan() more efficiently for C++ <cmath> than C <math.h>?).

Here's the assembly:

ucomisd %xmm0, %xmm0 ; set parity flag if x is NAN
setp    %dl          ; copy parity flag to %edx
ucomisd %xmm1, %xmm1 ; set parity flag if y is NAN
setp    %al          ; copy parity flag to %eax
orl     %edx, %eax   ; OR one byte of each result into a full-width register

Now let's try an alternative formulation that does the same thing:

int f(double x, double y)
{
  return std::isunordered(x, y);
}

Here's the assembly for the alternative:

xorl    %eax, %eax
ucomisd %xmm1, %xmm0
setp    %al

This is great--we cut the generated code almost in half! This works because ucomisd sets the parity flag if either of its operands is NAN, so we can test two values at a time, SIMD-style.

You can see code like the original version in the wild, for example: https://svn.r-project.org/R/trunk/src/nmath/qnorm.c

If we could make GCC smart enough to combine two isnan() calls everywhere, that would be pretty cool. My question is: can we, and how? I have some idea of how compilers work, but I don't know where in GCC this sort of optimization could be performed. The basic idea is whenever there is a pair of isnan() (or __builtin_isnan) calls OR'd together, it should emit a single ucomisd instruction using the two operands at the same time.

Edited to add some research prompted by Basile Starynkevitch's answer:

If I compile with -fdump-tree-all, I find two files which seem relevant. First, *.gimple contains this (and a bit more):

D.2229 = x unord x;
D.2230 = y unord y;
D.2231 = D.2229 | D.2230;

Here we can clearly see that GCC knows it will pass (x, x) to isunordered(). If we want to optimize by transforming at this level, the rule would be roughly: "Replace a unord a | b unord b with a unord b." This is what you get when compiling my second C code:

D.2229 = x unord y;

Another interesting file is *.original:

return <retval> = (int) (x unord x || y unord y);

That's actually the entire non-comment file generated by -fdump-tree-original. And for the better source code it looks like this:

return <retval> = x unord y;

Clearly the same sort of transformation can be applied (just here it's || instead of |).

But unfortunately if we modify the source code to e.g.:

if (__builtin_isnan(x))
  return true;
if (__builtin_isnan(y))
  return true;
return false;

Then we get quite different Gimple and Original output files, though the final assembly is the same as before. So maybe it's better to attempt this transformation at a later stage in the pipeline? The *.optimized file (among others) shows the same code for the version with "if"s as for the original version, so that's promising.

Community
  • 1
  • 1
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • 1
    Of course it's *possible* - but that doesn't mean it's desirable given the added complexity, overhead, code to maintain, frequency the optimisation will be used etc.. Anyway, suggesting it ***to the GCC devs*** is surely the next step in having it considered, not posting it here. – Tony Delroy Sep 26 '14 at 08:02
  • 1
    @TonyD: If you know a GCC dev who is willing and able and has the time to implement this, absolutely please pass this to them or tell me their email address and I will do it. Otherwise, the question is about whether I could do it myself without an inordinate amount of effort (I'm aware the learning curve for such things is very steep). There is already one on-topic, useful answer posted here that taught me something I would not have learned by merely submitting this as a GCC bug. – John Zwinck Sep 26 '14 at 08:11
  • 2
    In gcc-5, it might be as simple as `(simplify (or (unordered @0 @0) (unordered @1 @1)) (unordered @0 @1))` in one of the .pd files (well, probably not for the last version with `if`). Please file a PR. – Marc Glisse Sep 26 '14 at 15:51
  • @MarcGlisse: I filed it at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63387 with your suggestion for GCC 5, thank you. – John Zwinck Sep 27 '14 at 03:12

2 Answers2

10

This optimization is not only possible, it is now available in gcc-6: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=222077

Marc Glisse
  • 7,550
  • 2
  • 30
  • 53
3

There are two questions:

  • is the optimization you are proposing always legal in the strict C++11 standard (I don't know).

  • can GCC be customized by adding such an optimization: yes! You could extend it using MELT -e.g. write your own MELT extension doing that- or with your own GCC plugin coded (painfully) in C++ .

However, adding an extra optimization in GCC is a significant work (even with MELT): you need to understand the internals of GCC. So it is more than a week of work probably.

And I am not sure that such an optimization is really worth the effort.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Thanks, I see you're an/the author of MELT. It will clearly take some doing before I can grasp it, but I did run `gcc -fdump-tree-all` and edited some of my findings into the question. – John Zwinck Sep 26 '14 at 08:42
  • Well, I tried building MELT 1.0 (for use with GCC 4.7), but it gave me a lot of trouble. First I needed a newer `unifdef` than my system had (mine didn't support `-o` which you could probably overcome in your build system). Then it had an undefined reference in melt.so to something in libgmp, so I added that to the Makefile. then it complained of missing `meltbuild-stage0-quicklybuilt/warmelt-first+meltdesc.c` so I did make clean and now I get `melt-runtime.h:675:24: fatal error: meltrunsup.h: Too many levels of symbolic links`. – John Zwinck Sep 26 '14 at 09:22
  • I also tried building MELT 1.1 with GCC 4.9. I had to add `-lrt` to the link line for `melt.so` (otherwise undef-ref to `clock_gettime`), then I got the same error about `warmelt-first+meltdesc.c`. Next I got an error because you rely on `pstree -s` and my system's `pstree` doesn't have that option. I'm using Ubuntu 10.04. Sorry, old system, I can't do anything about it. My other system is a Mac and that uses Clang. :( – John Zwinck Sep 26 '14 at 09:30
  • Use MELT 1.1.2 at least (or even the latest snapshot) and please report bugs on `gcc-melt@googlegroups.com` - and subscribe to it – Basile Starynkevitch Sep 26 '14 at 09:37
  • I see it as very bad idea to use some 'feature' that not everyone will have access to when they have to re-build and/or maintain the code. – user3629249 Sep 27 '14 at 17:33