8

In my opinion, things like float('nan') should be optimized, but apparently they aren't in Python.

>>> NaN = float('nan')
>>> a = [ 1, 2, 3, NaN ]
>>> NaN in a
True
>>> float('nan') in a
False

Does it have any meaning with not optimizing nan like other things? In my thought, nan is only nan.

As well as this, when you use sorted on these things, they give weird results:

>>> sorted([3, nan, 4, 2, nan, 1])
[3, nan, 1, 2, 4, nan]


>>> 3 > float('nan')
False
>>> 3 < float('nan')
False

The comparison on nan is defined like this, but it doesn't seems 'pythonic' to me. Why doesn't it raise an error?

Alex Waygood
  • 6,304
  • 3
  • 24
  • 46
Ohamma
  • 99
  • 3
  • I like the observation and I think it's confusing, too. Which of the 7 statements should raise an exception? – Thomas Weller Sep 25 '21 at 16:47
  • In C# the result is `[NaN, NaN, 1,2,3,4]` which is more pythonic, IMHO. – Thomas Weller Sep 25 '21 at 16:52
  • The simple answer is "because that's how IEEE754 defines it for reasons that, at least at the time it was defined made sense to at least the people defining the standard". – Jörg W Mittag Sep 25 '21 at 16:55
  • In fact; the idea is that NaN is "not a value" so its comparison with any value is always False, including with itself. – Edouard Thiel Sep 25 '21 at 16:59
  • @JörgWMittag: do you have a reference where IEEE754 defines how floats shall be sorted? I'd like to read that – Thomas Weller Sep 25 '21 at 17:01
  • The implementation of NaN is also consistent with other operations: 2 + NaN gives NaN for instance. – Edouard Thiel Sep 25 '21 at 17:03
  • For the sorting, I think that'll depend on the sorting algorithm, and how stable it is. If the sorting algorithm compares for *in*equality, the NaN would be regarded as equal to the value compared to (since `x > NaN` evaluates to False, so the sorting algorithm will see them equal, or rather, as not-non-equal), and leave it in-place (stable sort). – 9769953 Sep 25 '21 at 17:04
  • 1
    One interesting feature of NaN is that it breaks the "(a is b) implies (a==b)" protocol, and the `in` operator checks identity before it checks equality. This means that `x in somelist` can be true at the same time as `all(x != y for y in somelist)`. – Ture Pålsson Sep 25 '21 at 17:05
  • Note that NumPy produces `array([ 1., 2., 3., 4., nan, nan])`. – 9769953 Sep 25 '21 at 17:08
  • @JörgWMittag: .NET also uses IEEE754 for floats, but still behaves differently. – Thomas Weller Sep 25 '21 at 17:09
  • Also https://stackoverflow.com/questions/4240050/python-sort-function-breaks-in-the-presence-of-nan (seems to be close enough to count as a duplicate of this question). – 9769953 Sep 25 '21 at 17:09
  • 3
    What do you mean by "optimize"? – hobbs Sep 26 '21 at 06:38
  • For "optimization" I meant making the id of `float('nan')` all the same for memory savings. Sorry if it sounded vague. – Ohamma Sep 26 '21 at 14:09

1 Answers1

9

Membership testing

Two different instances of float('nan') are not equal to each other. They are "Not a Number" so it makes sense that they shouldn't also have to be equal. They are different instances of objects which are not numbers:

print(float('nan') == float('nan'))  # False

As documented here:

For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).

There is a checking for identity! that's why you see that behavior in your question and why NaN in a returns True and float('nan') in a doesn't.


Sorting in Python

Python uses the Timsort algorithm for its sorted() function. (Also see this for a textual explanation.) I'm not going to go into that. I just want to demonstrate a simple example:

This is my class A. It's going to be our float('nan') object. It acts like float('nan') in that it returns False for all comparison operations:

class A:
    def __init__(self, n):
        self.n = n

    def __lt__(self, other):
        print(self, 'lt is calling', other)
        return False

    def __gt__(self, other):
        print(self, 'gt is calling', other)
        return False

    def __repr__(self):
        return f'A({self.n})'

class B:
    def __init__(self, n):
        self.n = n

    def __lt__(self, other):
        print(self, 'lt is calling', other)
        return False

    def __gt__(self, other):
        print(self, 'gt is calling', other)
        return False

    def __repr__(self):
        return f'B({self.n})'

When we use the sorted() function (or the .sort() method of a list) without the reverse=True argument, we're asking for the iterable to be sorted in ascending order. To do this, Python tries to call the __lt__ method successively, starting from the second object in the list to see if it is less than its previous object and so on:

lst = [A(1), B(2), A(3), B(4)]
print(sorted(lst))

output :

B(2) lt is calling A(1)
A(3) lt is calling B(2)
B(4) lt is calling A(3)
[A(1), B(2), A(3), B(4)]

Now, switching back to your example:

lst = [3, A(1), 4, 2, A(1), 1]
print(sorted(lst))

output:

A(1) lt is calling 3
A(1) gt is calling 4
A(1) gt is calling 2
A(1) lt is calling 2
A(1) lt is calling 4
A(1) gt is calling 1
[3, A(1), 1, 2, 4, A(1)]
  1. A(1).__lt__(3) will return False. This means A(1) is not less than 3 or This means 3 is in correct position relative to A(1).
  2. Then here int.__lt__(4, A(1)) gets called and because it returns NotImplemented object, Python checks to see if A(1) has implemented __gt__ and yes, so A(1).__gt__(4) will return False again and this means the A(1) object is in correct place relative to 4.
  3. (Etc.)

This is why the result of sorted() seems to be weird, but it's predictable. A(1) object in both cases, I mean when int class returns NotImplemented and when __lt__ gets called from A(1), will return False.

It's better to check the Timsort algorithm and consider those points. I would include the remaining steps if I read Timsort algorithm carefully.

Alex Waygood
  • 6,304
  • 3
  • 24
  • 46
S.B
  • 13,077
  • 10
  • 22
  • 49
  • @don'ttalkjustcode Thanks, corrected and also updated with better example. – S.B Sep 26 '21 at 07:31
  • Yes, nicer with the B. Could also make B behave like an int instead, i.e., compare normally with other B objects and return NotImplemented with A objects. Then we would see that `int.__lt__(4, A(1))` call in the output – no comment Sep 26 '21 at 11:15