2

I'm experimenting with NaN values and it turns out that sorting tuples containing NaN doesn't work very well.

>>> tuples = [(float('nan'), i) for i in range(7)]
... random.shuffle(tuples)
... sorted(tuples)
[(nan, 6), (nan, 0), (nan, 2), (nan, 5), (nan, 4), (nan, 3), (nan, 1)]

This kinda makes sense, considering that all comparison operations between NaN and NaN should return False, as explained in this question.

>>> float('nan') == float('nan') 
False
>>> float('nan') < float('nan') 
False
>>> float('nan') > float('nan') 
False

However, when I change my example slightly, it is suddenly possible to sort the list of tuples.

>>> nan = float('nan')
... tuples = [(nan, i) for i in range(7)]
... random.shuffle(tuples)
... sorted(tuples)
[(nan, 0), (nan, 1), (nan, 2), (nan, 3), (nan, 4), (nan, 5), (nan, 6)]

>>> tuples #  it was really shuffled
[(nan, 6), (nan, 0), (nan, 2), (nan, 3), (nan, 1), (nan, 4), (nan, 5)]

What's going on here? Why is it possible to sort the list in the second example, but not the first one?

Håken Lid
  • 22,318
  • 9
  • 52
  • 67
  • 1
    I would guess that because all your `nan` are the same instance in the second example, the sort can figure out to regard them as equal. Perhaps there is an `x is y` shortcut in the sort code. – khelwood Feb 24 '18 at 18:45

1 Answers1

3

Tuple comparison assumes that the </==/> operations on their elements are a valid weak ordering, and it compares elements with PyObject_RichCompareBool, which assumes that x is y implies x == y. When you use the same NaN object for all tuples, PyObject_RichCompareBool thinks the NaNs are equal.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Does that mean that PyObject_RichCompareBool breaks the IEEE754 floating point standard? I thought the definition required `NaN == NaN` to always be false? It is kinda weird that `nan is nan` can be true even though `nan == nan` is false. – Håken Lid Feb 24 '18 at 18:56
  • Adding explanation to intuition. Thanks. – Mad Physicist Feb 24 '18 at 18:59
  • 1
    @HåkenLid: Most of the cases where PyObject_RichCompareBool is used are cases where the `x == x` assumption needs to hold for the operation to be meaningful. By trying to sort tuples with NaNs in them, you've already broken the preconditions of the sort operation. That said, there are probably cases where PyObject_RichCompareBool is used where it really shouldn't be. – user2357112 Feb 24 '18 at 19:07