7

I have some numerical code that was developed on AMD64 Linux (using LLVM 3.2).

I have recently ported it to OSX 10.9 with XCode. It runs fine, but it fails a lot of the unit tests: it seems that some calculations which on Linux return NaN (or -NaN) now return, on OSX, -NaN (or NaN).

Can I safely assume that positive and negative NaNs are equivalent and adjust my unit tests to accept either as a success, or is this a sign of something more serious going wrong?

David Given
  • 13,277
  • 9
  • 76
  • 123
  • Something serious may already be going wrong if your NaNs have signs. That's not normal. – user2357112 Jan 25 '14 at 11:20
  • http://stackoverflow.com/a/8817304/267482 – bobah Jan 25 '14 at 11:23
  • Huh. I thought the sign bit was always ignored; looks like there are systems that display it. That's probably fine, then. – user2357112 Jan 25 '14 at 11:27
  • It is a bit odd that your tests distinguish different NaNs in the first place. If `x` and/or `y` is a NaN, then `x == y`, `x < y`, and `x > y` all return false; there is no way to distinguish NaNs by numerical comparison. This suggests that either your tests are examining the bits that represent NaNs or are examining some transformation of the NaNs, such as the characters produced by using `printf` on a NaN. In the former case, somebody has decided which bits are important, and you should understand why. In the latter case, you are depending on implementation-dependent properties of `printf`. – Eric Postpischil Jan 25 '14 at 11:37
  • Yup, the tests are just diffing the output of printf. And of course I know I need to understand which bits are important --- that's why I asked the question! – David Given Jan 25 '14 at 11:54

2 Answers2

19

There is no notion of a "negative NaN" in IEEE-754 arithmetic. The NaN encoding still has a sign bit, and there is a notion of a "sign bit" operation which uses or affects this bit (copysign, abs, a few others), but it does not have any meaning when the NaN encoding is interpreted as a value. Many print routines happen to print the bit as a negative sign, but it is formally meaningless, and therefore there isn't much in the standard to govern what its value should be (except w.r.t. the aforementioned functions).

Here's the relevant section of IEEE-754 (2008):

Conversion of a quiet NaN in a supported format to an external character sequence shall produce a language-defined one of “nan” or a sequence that is equivalent except for case (e.g., “NaN”), with an optional preceding sign. (This standard does not interpret the sign of a NaN.)

So your platform's conversion functions may print the "sign" of NaN values, but it has no meaning, and you shouldn't consider it for the purposes of testing.

Edited to be a bit stronger: it is almost always a bug to attach meaning to the "sign bit" of a NaN datum.

Stephen Canon
  • 103,815
  • 19
  • 183
  • 269
  • 1
    However, the fact that there are both results that are “NaN” on the first system and “-NaN” on the second and results that are “-NaN” on the first and “NaN” on the second suggests that something different is occurring in the earlier floating-point operations, not just the final `printf`. There may be a numerical result that has a different sign on the two systems before a NaN is generated. The fact that the test program has not identified errors in numerical results (non-NaN results) could be because the tests are insufficient. So it might be worth investigating why the signs differ. – Eric Postpischil Jan 25 '14 at 12:34
  • 3
    The more likely explanation is that there is a library function that preserves the "sign" of NaN on one platform but not on the other (which is fine, because the "sign" has no meaning). This is very common; one platform has `if (isnan(x)) return x`, another has `if (isnan(x)) return NAN`, or similar. The only conditions under which I would say further investigation is appropriate would be if the routine under test contains nothing but basic arithmetic operations (no library calls) and is being run on identical hardware. – Stephen Canon Jan 25 '14 at 12:36
  • @StephenCanon: Even when using identical hardware, it's possible that code is doing `a+b` in one case and `b+a` in another; addition is expected to always be commutative except when adding positive and negative zero, or both operands are NaN (if only one is NaN, the result should be that NaN verbatim, but if both are NaN the standard is silent as to whether the result should be the first, the second, the "maximum", the "bitwise OR", or something else). – supercat Feb 14 '14 at 17:54
  • @supercat: yup (but addition of +0 and -0 **is** commutative). – Stephen Canon Feb 14 '14 at 17:59
  • @StephenCanon: According to WIKI, (+0)+(-0) yields (+0), but (-0)+(+0) yields (-0). Such a rule is IMHO silly since making addition non-commutative purely for an extremely narrow and generally-irrelevant corner case has the effect of making it non-commutative nearly everywhere, but that's not quite so bad as the failure to define any form of comparison that can be used by itself as an equivalence relation (the best is `!((a < b) || (a > b))`, but that's icky). – supercat Feb 14 '14 at 18:08
  • @supercat: I’m afraid that wiki is making things up. According to an IEEE-754 committee member (me), (+0) + (-0) in any order is +0, except if the rounding mode is round-toward-minus-infinity, in which case it is -0. – Stephen Canon Feb 14 '14 at 18:20
  • If you don’t want to take my word for it, here’s the relevant section of the standard: "When the sum of two operands with opposite signs (or the difference of two operands with like signs) is exactly zero, the sign of that sum (or difference) shall be +0 in all rounding-direction attributes except roundTowardNegative; under that attribute, the sign of an exact zero sum (or difference) shall be −0." – Stephen Canon Feb 14 '14 at 18:21
  • @StephenCanon: Thanks for that info. Do you know any better way of implementing an equivalence relations for doubles than the one I gave above? Is there any two-operand combination which will impose a total ordering [whether or not NaN has a numerical "order" with regard to anything else, operations like sorting require a total order]. Personally, I'd like to see languages include a couple floating-point types which mostly behave like IEEE *but* defined relational operators in a fashion suitable for database use; what would you think of such a thing? – supercat Feb 14 '14 at 18:27
  • @supercat: IEEE-754 (2008) added the `totalOrder` predicate (though that may be slightly stricter than what you want). It will take a little while to filter into language standards and then into implementations, however; there’s a series of current TRs to provide C bindings for all the new features. It seems like you don’t really need new *types*, just a few extra comparison predicates (some of which are already in the standard, but will take a while to be widely usable). – Stephen Canon Feb 14 '14 at 18:34
  • @StephenCanon: From the hardware/computational standpoint, adding new predicates makes more sense than adding new types. From a *programming-language* standpoint, however, allowing common usage patterns to be specified declaratively rather than imperatively would seem cleaner and less likely to produce bugs. For example, I think it would be helpful for languages to have a type which means `IEEE single`, which would require all operations to be performed using 32-bit values even if the hardware could work faster with 64- or 80-bit values, and which would not implicitly promote to anything... – supercat Feb 14 '14 at 18:45
  • ...and a `short double` type which would be stored as 32 bits but would request that 64- or 80-bit math be used for all intermediate calculations involving more than one operation (e.g. `f1=f2+f3+f4;`), and would accept implicit casts to and from longer types. A `fast single` would be similar, but the promotion to 64 bits could be skipped if 32-bit math would be faster. All three types would be stored as 32 bits, but performance could be improved when using the third, and many accidental mistakes could be caught when using the first or avoided when using the second. – supercat Feb 14 '14 at 18:51
4

It depends entirely on what your unit tests are testing.

Most likely you'll be able to treat them as equivalent unless the testing you're doing is actually of the IEEE754 floating point software itself, or the C runtime code that prints them. Otherwise, you should treat them as identical if the code that uses what you're testing treats them as identical.

That's because the tests should echo your real usage, in every circumstance. An (admittedly contrived) example is if you're testing the function doCalc() which returns a double. If it's only ever used thus:

x = doCalc()
if x is any sort of Nan:
    doSomethingWithNan()

then your test should treat all NaN values as equivalent. However, if you use it thus:

x = doCalc()
if x is +Nan:
    doSomethingForPositive()
else:
    if x is -Nan:
        doSomethingForNegative()

then you'll want to treat them as distinct.

Similarly, if your implementation creates a useful payload in the fractional bits (see below), and your real code uses that, it should be checked by the unit tests as well.


Since a NaN is simply all 1-bits in the exponent and something other than all zero bits in the fraction, the sign bit may be positive or negative, and the fractional bits may be a wide variety of values. However, it's still a value or result that was outside the representation of the data type so, if you were expecting just that, it probably makes little difference what the sign or payload contain.

In terms of checking the textual output of NaN values, the Wikipedia page on NaN indicates that different implementations may give you widely varying outputs, among them:

nan
NaN
NaN%
NAN
NaNQ
NaNS
qNaN
sNaN
1.#SNAN
1.#QNAN
-1.#IND

and even variants showing the varying sign and payload that have no affect on its NaN-ness:

-NaN
NaN12345
-sNaN12300
-NaN(s1234)

So, if you want to be massively portable in your unit test, you'll notice that all the output representations bar one have some variant of the string nan in them. So a case-insensitive search through the value for the string nan or ind would pick them all up. That may not work in all environments but it has a very large coverage.

For what it's worth, the C standard has this to say about outputting floating point values with %f (%F uses uppercase letters):

A double argument representing a NaN is converted in one of the styles [-]nan or [-]nan(n-char-sequence) - which style, and the meaning of any n-char-sequence, is implementation-defined.

So it would suffice there to simply check if the value had nan somewhere inside it.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • That doesn't quite answer my question, though --- I need to know whether the sign bit in the NaN is important enough for my unit tests to pay attention to. I'm really surprised that OSX and Linux produce different results here: it's the same processor and the same compiler and I thought that the IEEE float spec didn't give any wriggle room to produce different results here. Is this something I need to care about? – David Given Jan 25 '14 at 11:59
  • @David, as I stated, it depends on your tests. Unless you're specifically testing the output of the NaN, you can almost certainly safely assume all NaNs are identical, since any code that finds one will almost certainly treat them the same way. If you're testing a function whose callers will behave differently based on sign, then you should treat them _differently._ But that would be very unusual. I'll clarify. – paxdiablo Jan 25 '14 at 12:02
  • There is no meaningful notion in which a value can be +NaN or -NaN, as NaN does not have a sign; "treating the values as distinct" is a bug in a program, if the intent is to run on IEEE-754 arithmetic. – Stephen Canon Jan 25 '14 at 12:19
  • @Stephen, while ieee754 may not distinguish, the _output_ of C allows for that information to be presented so, if that's what's being checked, you need to allow for it. – paxdiablo Jan 25 '14 at 13:00
  • I'm not saying that the sign "doesn't exist" or "has no observable effect"; I'm saying that any program that looks like your second code snippet has a bug, so there is [almost] never a good reason to treat them as distinct. – Stephen Canon Jan 25 '14 at 13:15
  • So under what situations *is* it important to treat them as distinct? There must be some, otherwise printf wouldn't be rendering them distinctly. – David Given Jan 25 '14 at 15:44
  • 1
    `printf` may render them distinctly for a variety of reasons: because the standard allows it and the behavior happens to fall out naturally from an implementation, because the guy who wrote the converter for the format specifier in question didn't know that there's no reason to distinguish them, or because they had some specific use that was relevant to their needs but outside the scope of the standard. The sign bit of an IEEE-754 NaN has no semantic meaning. – Stephen Canon Jan 25 '14 at 16:23