0

Here, I have the following:

>>> import numpy as np
>>> q = np.nan
>>> q == np.nan
False
>>> q is np.nan
True
>>> q in (np.nan, )
True

So, the question is: why nan is not equal to nan, but is nan? (UNIQUE) And why 'in' returns True? I don't seem to be able to trace down the implementation of nan. It leads me to C:\Python33\lib\site-packages\numpy\core\umath.pyd (row NAN = nan), but from there there is no traceable way to find out what nan actually is.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • 3
    There are many things that are "not a number", ships, dinosaurs, cookies... – BlackBear Mar 17 '17 at 11:37
  • 3
    Possible duplicate of [What is the rationale for all comparisons returning false for IEEE754 NaN values?](http://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values) –  Mar 17 '17 at 11:40
  • 2
    `NaN` in numpy at least has the property that `np.nan != np.nan` – EdChum Mar 17 '17 at 11:41
  • Yes, I see, they are not equal. But you are missing three important points of the question: 1) Why (q is np.nan) is True 2) Why exactly (q in (np.nan, )) works 3) How is nan implemented. – Anton Bohdanov Mar 17 '17 at 11:44

1 Answers1

5

The creators of numpy decided that it made most sense that most comparisons to nan, including ==, should yield False. You can do this in Python by defining a __eq__(self, other) method for your object. This behaviour was chosen simply because it is the most useful, for various purposes. After all, the fact that one entry has a missing value, and another entry also has a missing value, does not imply that those two entries are equal. It just implies that you don't know whether they are equal or not, and it's therefore best not to treat them as if they are (e.g. when you join two tables together by pairing up corresponding rows).

is on the other hand is a Python keyword which cannot be overwritten by numpy. It tests whether two objects are the same thing. nan is the same object as nan. This is also useful behaviour to have anyway, because often you will want to e.g. get rid of all entries which don't have a value, which you can achieve with is not nan.

nan in (nan,) returns True because as you probably know, (nan,) is a tuple with only one element, nan, and when Python checks if an object is in a tuple, it is checking whether that object is or == any object in the tuple.

Denziloe
  • 7,473
  • 3
  • 24
  • 34
  • Great answer, thanks. But it's kinda missing these two points in question: 1) Why exactly (q in (np.nan, )) works? 1) How is nan implemented? – Anton Bohdanov Mar 17 '17 at 11:47
  • I did answer how it's implemented; it has an `__eq__` method which returns `False`. As to why `q in (nan,)` works -- basically because `in` checks whether an object `is` any object in the tuple. I will update my answer to reflect this. – Denziloe Mar 17 '17 at 11:49
  • I meant. don't you know where do I find actual implementation of nan in the actual numpy library. I do want to read it. – Anton Bohdanov Mar 17 '17 at 11:53
  • 1
    Typo: you write "in on the other hand" when you mean "is". More importantly: "and when Python checks if an object is in a tuple, it is checking whether that object `is` any object in the tuple" isn't true -- it checks if the object `is` *or* `==` any object in the tuple. – DSM Mar 17 '17 at 12:05
  • I'd make the slight correction that ``nan`` isn't a missing value, that is represented by ``na``. – James Elderfield Mar 17 '17 at 12:11
  • @AntonBohdanov Sorry, I don't know that. You'll have to track it down yourself if you're very keen. I feel like I've answered your original questions though. – Denziloe Mar 17 '17 at 12:12
  • @JamesElderfield I'm not super familiar with `numpy` but `numpy.na` doesn't seem to be a thing? – Denziloe Mar 17 '17 at 12:14
  • @Denziloe Good catch, it's numpy.NA. See https://docs.scipy.org/doc/numpy-1.10.0/neps/missing-data.html for its use and comparison with ``nan``. – James Elderfield Mar 17 '17 at 12:16
  • I actually checked that and I have no `numpy.NA` either -- perhaps it's deprecated? – Denziloe Mar 17 '17 at 12:18
  • 1
    @Denziloe Aha no I need to read better - the linked document is an enhancement proposal rather than documentation of an existing feature – James Elderfield Mar 17 '17 at 13:54
  • 2
    @Denziloe Rather, it's better to say that the authors of the IEEE-754 floating point standard decided that it was better to have `NaN == NaN` be false. numpy (and Python) just expose the C compiler's IEEE-754 floating point semantics. – Robert Kern Mar 17 '17 at 16:44