16

This is more of a 'Can you explain this' type of question than it is anything else.

I came across a problem at work where we were using NaN values in a table, but when the table was sorted, it came out in a very strange, strange manner. I figured NaN was mucking up something so I wrote up a test application to see if this is true. This is what I did.

static void Main(string[] args)
{
    double[] someArray = { 4.0, 2.0, double.NaN, 1.0, 5.0, 3.0, double.NaN, 10.0, 9.0, 8.0 };

    foreach (double db in someArray)
    {
        Console.WriteLine(db);
    }

    Array.Sort(someArray);
    Console.WriteLine("\n\n");
    foreach (double db in someArray)
    {
        Console.WriteLine(db);
    }

    Console.ReadLine();
}

Which gave the result:

Before:

4,2,NaN,1,5,3,NaN,10,9,8

After:

1,4,NaN,2,3,5,8,9,10,NaN

So yes, the NaN some how made the sorted array to be sorted in a strange way.

To quote Fry; "Why is those things?"

Kris Ivanov
  • 10,476
  • 1
  • 24
  • 35
Emmanuel F
  • 1,125
  • 1
  • 15
  • 34
  • whats your IComparable implementation for the array? looks like the default – Kris Ivanov Feb 25 '11 at 23:06
  • There is a really good explanation of this behavior wrt `Math.Max` and `IEnumerable.Max` floating about on SO somewhere. –  Feb 25 '11 at 23:07
  • 5
    I copy/pasted your code into my Visual Studio and ran it, and it sorts it like so: NaN,NaN,1,2,3,4,5,8,9,10. This was with .Net 4. When I switched to .Net 3.5SP1, it sorted like you said. – Phil Feb 25 '11 at 23:07
  • Related (but not duplicate): http://stackoverflow.com/questions/4933769/c-nan-comparison-differences-between-equals-and -- this, however, is in interesting in that `NaN.Equals(NaN)` is true. –  Feb 25 '11 at 23:08
  • @Phil: A bug in .NET 2.0? That's scary. – Steven Feb 25 '11 at 23:11
  • @Steven It's not a bug if it's expected to behave that way ;-) Would be interesting to see some formal documentation on why they act as they do. –  Feb 25 '11 at 23:12
  • 1
    @pst: If it is expected to behave this way, why change it? :-) – Steven Feb 25 '11 at 23:14
  • 1
    For some additional thoughts on ways you can mess up a QuickSort, see my recent series of articles on that subject. Part one is here: http://blogs.msdn.com/b/ericlippert/archive/2011/01/20/spot-the-defect-bad-comparisons-part-one.aspx – Eric Lippert Feb 25 '11 at 23:19
  • @Eric Lippert Right -- but this doesn't explain why `Array.Sort` doesn't seem to be using the `CompareTo` of the double -- double.CompareTo defines ordering wrt. NaN and non-NaN numbers and I can't find a difference between .NET35 and .NET4 from the MSDN documentation. –  Feb 25 '11 at 23:25
  • @Phil well, I'm using 3.5, but it's interesting that 4.0 now sorts it differently. I'm interested in why that was done so. @K Ivanov, That was just a plain vanilla .NET 3.5 sort. Nothing special. – Emmanuel F Feb 25 '11 at 23:33
  • 2
    @Agent I asked the question here: http://stackoverflow.com/q/5123886/138757 Let's hope someone knows. – Phil Feb 25 '11 at 23:34
  • @Phil Cool! I'll keep an eye on it too. It's the finer details of things that I sometimes find interesting. :) – Emmanuel F Feb 25 '11 at 23:36
  • 1
    @Steven After an in-depth look, I do agree that it is a bug in .NET35 (wrt the documentation at least) :-) –  Feb 26 '11 at 00:10

5 Answers5

10

I believe that's because

a < NaN == false
a > NaN == false
a == NaN == false

so the comparison on them breaks down, and that throws off the entire sort.

corsiKa
  • 81,495
  • 25
  • 153
  • 204
  • 1
    this is exactly it. if you want a *reasonable* behavior, you need to write a custom comparator that uses `IsNaN` explicitly. all comparisons with NaN are false, no matter what the comparison is. even `NaN == NaN` will return false, which seems should violate the rules, but doesn't. – John Gardner Feb 25 '11 at 23:07
  • @John thanks for clarification. I knew it was true in Java, but I'm not a C# guy (even if it's my second most upvoted tag for answers... O_o) – corsiKa Feb 25 '11 at 23:08
  • It would be interesting to see this question updated wrt. Phil's comment and behavior changes in .NET35 and .NET4. –  Feb 25 '11 at 23:10
  • the reason is because of QuickSort algorithm; the implementation performs an unstable sort; nothing to do with comparison of the elements – Kris Ivanov Feb 25 '11 at 23:11
  • @K Ivanov Even if unstable it should only NaN, NaN ... 1, 2, 3... if an ordering is well-defined for NaN and NaN/non-NaN. Stability just means that if x1 == x2 then x1, x2 could be sorted x1, x2 or x2, x1 (if unstable). –  Feb 25 '11 at 23:13
  • @K Ivanov what does being stable have to do with not using comparisons? QuickSort is still comparison based. You would get a bad answer if you sorted with MergeSort too... – corsiKa Feb 25 '11 at 23:16
  • @glowcoder This does not explain the .NET4 difference -- does the default (.NET35) comparator use `>/==` for double types? Because the ordering is defined "correctly" for `double.CompareTo`. –  Feb 25 '11 at 23:23
  • Well that makes sense. I knew that you can't strictly compare a value to NaN, but I figured Sorting would have a better way of handling it. For example, using Infinity or -Infinity will yield `1,2,3,4,5,8,9,10,Infinity,Infinity`. (Read this from a SO sorting [question](http://stackoverflow.com/questions/3236889/what-is-the-best-sort-algorithm-for-a-random-set-of-floats/3237791#3237791) ) – Emmanuel F Feb 25 '11 at 23:31
  • @glowcoder I reject this answer (even if I have given an up-vote and will let is remain) because (at least per documentation), there is no use of `>/==` in `Array.Sort` (per documentation) and the behavior is well-defined for `CompareTo/CompareTo(T)` on a double. –  Feb 25 '11 at 23:56
6

Edit (conclusion. final. end.): This is a bug.

See bug-report Bug in List<double/single>.Sort() [.NET35] in list which contains double.NaN and go give Hans Passant an up-vote at the Why does .NET 4.0 sort this array differently than .NET 3.5? from which I ripped the link.

Historical musings

[See the post: Why does .NET 4.0 sort this array differently than .NET 3.5?, where, hopefully, more useful discussion on this particular issue can be figured out for real. I have cross-posted this response there as well.]

The behavior pointed out in .NET4 by Phil is that defined in CompareTo. See double.CompareTo for .NET4. This is the same behavior as in .NET35 however and should be consistent in both versions, per the method documentation...

Array.Sort(double[]): doesn't seem to be using CompareTo(double[]) as expected and this may very well be a bug -- note the difference in Array.Sort(object[]) and Array.Sort(double[]) below. I would love clarification/corrections on the following.

In any case, the answers using > and < and == explain why those operators don't work but fail to explain why Array.Sort leads to unexpected output. Here are some of my findings, as meager as they may be.

First, the double.CompareTo(T) method documentation -- this ordering is well-defined according to the documentation:

Less than zero: This instance is less than value. -or- This instance is not a number (NaN) and value is a number.

Zero: This instance is equal to value. -or- Both this instance and value are not a number (NaN), PositiveInfinity, or NegativeInfinity.

Greater than zero: This instance is greater than value. -or- This instance is a number and value is not a number (NaN).

In LINQPad (3.5 and 4, both have same results):

0d.CompareTo(0d).Dump();                  // 0
double.NaN.CompareTo(0d).Dump();          // -1
double.NaN.CompareTo(double.NaN).Dump();  // 0
0d.CompareTo(double.NaN).Dump();          // 1

Using CompareTo(object) has the same results:

0d.CompareTo((object)0d).Dump();                  // 0
double.NaN.CompareTo((object)0d).Dump();          // -1
double.NaN.CompareTo((object)double.NaN).Dump();  // 0
0d.CompareTo((object)double.NaN).Dump();          // 1

So that's not the problem.

Now, from the Array.Sort(object[]) documentation -- there is no use of >, < or == (according to the documentation) -- just CompareTo(object).

Sorts the elements in an entire one-dimensional Array using the IComparable implementation of each element of the Array.

Likewise, Array.Sort(T[]) uses CompareTo(T).

Sorts the elements in an entire Array using the IComparable(Of T) generic interface implementation of each element of the Array.

Let's see:

LINQPad (4):

var ar = new double[] {double.NaN, 0, 1, double.NaN};
Array.Sort(ar);
ar.Dump();
// NaN, NaN, 0, 1

LINQPad (3.5):

var ar = new double[] {double.NaN, 0, 1, double.NaN};
Array.Sort(ar);
ar.Dump();
// NaN, 0, NaN, 1

LINQPad (3.5) -- NOTE THE ARRAY IS OF OBJECT and the behavior is "expected" per the CompareTo contract.

var ar = new object[] {double.NaN, 0d, 1d, double.NaN};
Array.Sort(ar);
ar.Dump();
// NaN, NaN, 0, 1

Hmm. Really. In conclusion:

I HAVE NO IDEA.

Happy coding.

Community
  • 1
  • 1
1

Conceptually NaN is Not a Number, so comparing to a number makes no sense, hence:

a < NaN = false for all a,
a > NaN = false for all a and
NaN != NaN (!)

To solve this you need to write your own comparer that uses IsNaN to make NaNs smaller than (or larger than) all numbers, so that they will all appear at one end of the sort.

Edit: Here's a sample of the Comparison version:

class Program
{
    private static int NaNComparison(double first, double second)
    {
        if (double.IsNaN(first))
        {
            if (double.IsNaN(second)) // Throws an argument exception if we don't handle both being NaN
                return 0;
            else
                return -1;
        }
        if (double.IsNaN(second))
            return 1;

        if (first == second)
            return 0;
        return first < second ? -1 : 1;
    }

    static void Main(string[] args)
    {
        var doubles = new[] { double.NaN, 2.0, 3.0, 1.0, double.NaN };

        Array.Sort(doubles, NaNComparison);
    }
}
Jackson Pope
  • 14,520
  • 6
  • 56
  • 80
  • +1 because this describes .NET35 behavior, however, any reason for the .NET35/4 differences then? (See Phil's comment). Also, are `>/!=` used or is IComparable? –  Feb 25 '11 at 23:17
0

Actually, the strange sorting behavior is the result of a bug in .NET 3.5. The bug was addressed with .NET 4.0.

The only way to resolve it is to use your own custom comparer, or upgrade to .NET 4.0. See Why does .NET 4.0 sort this array differently than .NET 3.5?

Community
  • 1
  • 1
Phil
  • 6,561
  • 4
  • 44
  • 69
-1

since you are using the default sort which is QuickSort algorithm; the implementation performs an unstable sort; that is, if two elements are equal, their order might not be preserved

Kris Ivanov
  • 10,476
  • 1
  • 24
  • 35
  • @K Ivanov Even if unstable it should only NaN, NaN ... 1, 2, 3... if an ordering is well-defined for NaN and NaN/non-NaN. Stability just means that if x1 == x2 then x1, x2 could be sorted x1, x2 or x2, x1 (if unstable). In this case there is non-Nan ... NaN ... non-NaN. –  Feb 25 '11 at 23:15