8

While numpy.nan is not equal to numpy.nan, and (float('nan'), 1) is not equal to float('nan', 1),

(numpy.nan, 1) == (numpy.nan, 1)

What could be the reason? Does Python first check to see if the ids are identical? If identity is checked first when comparing items of a tuple, then why isn't it checked when objects are compared directly?

satoru
  • 31,822
  • 31
  • 91
  • 141

4 Answers4

6

When you do numpy.nan == numpy.nan it's numpy that is deciding whether the condition is true or not. When you compare tuples python is just checking if the tuples have the same objects which they do. You can make numpy have the decision by turning the tuples into numpy arrays.

np.array((1, numpy.nan)) == np.array((1,numpy.nan))
>>array([ True, False], dtype=bool)

The reason is when you do == with numpy objects you're calling the numpy function __eq__() that says specifically that nan != nan because mathematically speaking nan is undetermined (could be anything) so it makes sense that nan != nan. But when you do == with tuples you call the tuples __eq__() function that doesn't care about mathematics and only cares if python objects are the same or not. In case of (float('nan'),1)==(float('nan'),1) it returns False because each call of float('nan') allocates memory in a different place as you can check by doing float('nan') is float('nan').

João Abrantes
  • 4,772
  • 4
  • 35
  • 71
  • Tuple compare properly calls the same `__eq__()` implementation that `==` calls, so I don't get your argument. – dhke Jun 03 '15 at 08:52
  • @dhke of course it does. I am not comparing (1,nan).__eq__((1,nan)) with (1,nan)==(1,nan). I am comparing (1,nan)==(1,nan) with np.array((1,nan))==np.array((1,nan)) – João Abrantes Jun 03 '15 at 08:57
  • Well, the OP's question is "*What could be the reason?*". Workaround is all fine, but that doesn't change the fact that `numpy.nan` breaks an assumption made by the standard library. – dhke Jun 03 '15 at 09:00
3

Container objects are free to define what equality means for them, and for most that means one thing is really, really important:

for x in container:
    assert x in container

So containers typically do an id check before an __eq__ check.

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
3

When comparing two objects in a tuple Python first check to see if they are the same.

Note that numpy.nan is numpy.nan, but float('nan') is not float('nan').

In Objects/tupleobject.c, the comparison is carried out like this:

for (i = 0; i < vlen && i < wlen; i++) {
    int k = PyObject_RichCompareBool(vt->ob_item[i],
                                     wt->ob_item[i], Py_EQ);
    if (k < 0)
        return NULL;
    if (!k)
        break;
}

And in PyObject_RichCompareBool, you can see the check for equality:

if (v == w) {
    if (op == Py_EQ)
        return 1;
    else if (op == Py_NE)
        return 0;
}

You can verify this with the following example:

class A(object):
    def __eq__(self, other):
        print "Checking equality with __eq__"
        return True

a1 = A()
a2 = A()

If you try (a1, 1) == (a1, 1) nothing get printed, while (a1, 1) == (a2, 1) would use __eq__ and print our the message.

Now try a1 == a1 and see if it surprises you ;P

satoru
  • 31,822
  • 31
  • 91
  • 141
  • Which seems to make this a bug in `numpy.nan`, because it breaks an assumption made by the core library. – dhke Jun 03 '15 at 08:56
  • @dhke The assumption that identity implies equality is not used everywhere though. For example, `assert numpy.nan != numpy.nan`, in this case the identity check is obviously not used. – satoru Jun 03 '15 at 09:07
  • The example already established, that `ǹumpy` does away with *identity is equality*. Python docs also explicitly state that there is not [*implied relationship*](https://docs.python.org/2/reference/datamodel.html#object.__eq__) between `==` and `!=`, so that seems fine to me. What still bothers me where else the simple optimization in `PyObject_RichCompareBool` causes trouble. Because on the other hand `None == None` ... – dhke Jun 03 '15 at 09:17
  • 1
    I tried `a1 == a1` and got `Checking equality with __eq__\nTrue`. This did not surprise me. – Ethan Furman Jun 03 '15 at 15:16
  • `float('NaN') is not float('NaN')` because two different floats were created. `my_nan = float('NaN'); my_nan is my_nan` would be `True`. – Ethan Furman Jun 17 '15 at 22:40
0

Tuples do check first with identity and then with equality if identity doesn't match.

(float('nan'),) == (float('nan'),)

is False simply because a different object instance is created... if you do instead:

x = float('nan')
print (x,) == (x,)

you will get True too because x == x is False, but x is x is True.

Numpy numpy.nan is a static instance and that's why it "doesn't work".

As a wild guess this "shortcut" of checking identity first is done for performance reasons.

6502
  • 112,025
  • 15
  • 165
  • 265