2

With NaN, it is possible to get a list that will not properly sort:

--> NaN = float('nan')
--> spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN]
--> sorted(spam)
[1, 2, nan, 3, nan, 4, 5, 7, nan]

I'm constructing a Null object that will behave a lot like NaN, with the semantics that if the returned object is Null, it's actual value is unknown. A Null object will also be able to interact with any other type of object (int, float, str, bool, etc), but any interactions will result in Null.

From a purist point of view if it is unknown then comparison results are also unknown since the actual value might be greater, lesser, or the same as the value being compared against.

From a practical point of view a list with Nulls scattered throughout is a pain in the backside.

So I am strongly leaning towards implementing the comparisons such that Null objects are less than other objects so they will always sort together.

Of course, I could always dodge the issue and force the user to implement custom sort keys.

Any thoughts/advice/criticisms/etc?

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
  • Without knowing any more about the problem domain it's difficult to give criticism. "Purity" is rarely the best motivation in design. What are the practical tradeoffs? – jeffknupp Jan 13 '12 at 19:10
  • Similar question here: [Python: sort function breaks in the presence of nan](http://stackoverflow.com/q/4240050/222914) – Janne Karila Jan 13 '12 at 19:14
  • 1
    Agreed with jknupp. Any code handling these 'Null' values will probably have to include special-case code for them anyway, so it makes more sense to require the user to `filter` them out before sorting. The minor amount of extra code increases comprehensibility. – twooster Jan 13 '12 at 19:16
  • Your question and title talk about null objects, while your code uses NaN (not a number). That's completely different things, making it very hard to guess what your question is. NaN is not a Null object in ant reasonable sense. – Lennart Regebro Jan 13 '12 at 21:16
  • 1
    I had to rewrite code from someone else who tried to have this kind of "null that works with numbers secretly and silently". The problem they had was that `if` statements and filters were still required. After reviewing thousands of lines of their code, I found that `None` worked just as well as what they had. A few expressions required `None if x is None else ` guards, but they were so few that it was a win to give up on the "special math-tolerant null". – S.Lott Jan 13 '12 at 22:57

2 Answers2

5

NaN is commonly defined as being not comparable to anything. Any computation involving NaN is supposed to return NaN.

In fact:

>>> print float('nan') == float('nan')
False

Yes: NaN is not even the same as itself. There are good reasons to have it this way, although it indeed is counter-intuitive. The prime reason probably is that - in contrast to all other numbers - there is no unique way of sorting them ascending. Should the come first, last, at the end? before or after infinity? Floating point numbers have a couple of odd things. But at least there is no doubt about -infty < -123 < -0 <= +0 < 123 < +infty.

It's "not a number", so how can it be larger, smaller or equal than a number?

Of course you can define a custom compare function that has a well-defined sorting behavior for NaN values:

def s(x, y):
  import math
  if math.isnan(x): return 1
  return cmp(x, y)

Note how I'm using math.isnan. This function has clear semantics: it sorts all numbers first, then any NaN value.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
1

If the Null object implements comparison behavior, other methods (such as indexing) will get more complicated. Consider:

target = table.sql('select * where sales < 1000.00')

If Null values compare < all other objects then target could have rows where there were no sales (which is not the goal).

So, I think practicality and purity are both coming down on the same side on this one: Null comparisons yield unknown. Users will have to decide what to do with Null values if they get them.

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237