25

If we make a pathological potato like this:

>>> class Potato:
...     def __eq__(self, other):
...         return False
...     def __hash__(self):
...         return random.randint(1, 10000)
... 
>>> p = Potato()
>>> p == p
False

We can break sets and dicts this way (note: it's the same even if __eq__ returns True, it's mucking with the hash that broke them):

>>> p in {p}
False
>>> p in {p: 0}
False

Also len({p: 0, p: 0}) == 2, and {p: 0}[p] raises KeyError, basically all mapping related stuff goes out the window, as expected.

But what I didn't expect is that we can't break lists

>>> p in [p]
True

Why is that? It seems that list.__contains__ iterates, but it's first checking identity before checking equality. Since it is not the case that identity implies equality (see for example NaN object), what is the reason for lists short-circuiting on identity comparisons?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
wim
  • 338,267
  • 99
  • 616
  • 750
  • Maybe `list.__contains__` compares objects by `id()` instead of `eq()` ? `(id(p) == id(p)) is True` – Håken Lid Apr 17 '15 at 07:01
  • 1
    @jonrsharpe OP already knows about it. I think he wants to understand why List checks object identity first instead of equality, I guess. – thefourtheye Apr 17 '15 at 07:01
  • @HåkenLid yes, that's what it does, I think the OP is asking *why*. – jonrsharpe Apr 17 '15 at 07:02
  • To save others the trouble, same in Python 2 new-style (object) and in Python 3.4 – cdarke Apr 17 '15 at 07:04
  • Presumably because `x in [x]` asks *"is this object in that list"* (which it is) not *"are any items in that list equal to this object"* (which would be `any(y == x for y in [x])`). – jonrsharpe Apr 17 '15 at 07:06
  • @jonrsharpe `1 in [1.]` has to be true though, so I think identity is neither sufficient nor required – wim Apr 17 '15 at 07:08
  • Yes, which is why `__eq__`ality is checked *as a fallback*. Equal identity is sufficient, unequal identity isn't. – jonrsharpe Apr 17 '15 at 07:09
  • Identity *is* sufficient. The fact that `1 in [1.]` is true just means identity is not necessary. – BrenBarn Apr 17 '15 at 07:12
  • The 2 answers here are enlightening but the actual question here is why? Is it a performance optimisation only, or is there a valid logical reason for lists to take this shortcut? Is the behaviour shown correct python, or an implementation detail of cpython? – wim Apr 17 '15 at 08:04
  • @wim: Did you read my updated answer? And check the link that BrenBarn gave (which is where my update came from) ? The why is to keep certain invariants true. – Ethan Furman Apr 17 '15 at 08:13
  • 1
    @wim per the bug comments linked by BrenBarn, [`for a in container: assert a in container # this should ALWAYS be true`](http://bugs.python.org/issue4296#msg75735) is a good summary of the logical reason. – jonrsharpe Apr 17 '15 at 08:15
  • 1
    The problem is that identity **should** imply equality. The NaN object didn't do this only because the standard says Syu, but by all means it's a very bad idea to make a type that doesn't follow that implication. And if you do you should 1) know that you really cannot do otherwise and 2) be prepared to break some code. – Bakuriu Apr 17 '15 at 08:30

2 Answers2

12

list, tuple, etc., does indeed do an identity check before an equality check, and this behavior is motivated by these invariants:

assert a in [a]
assert a in (a,)
assert [a].count(a) == 1
for a in container:
    assert a in container    # this should ALWAYS be true

Unfortunately, dicts, sets, and friends operate by hashes, so if you mess with those you can indeed effectively break them.

See this issue and this issue for some history.

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
  • 1
    I don't understand what you mean by saying "this behavior is motivated by NaNs". Why would a behavior that causes strange behavior with NaNs be motivated by NaNs? That makes it sound like they made it that way specifically to break NaNs. – BrenBarn Apr 17 '15 at 07:23
8

In general, breaking the assumption that identity implies equality can break a variety of things in Python. It is true that NaN breaks this assumption, and thus NaN breaks some things in Python. Discussion can be found in this Python bug. In a pre-release version of Python 3.0, reliance on this assumption was removed, but the resolution of the bug was to put it back in (i.e., make Python 3 give the same behavior as Python 2, in which the identity check shortcut is done). The documentation for Python 3 correctly says:

For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).

However, it appears the documentation for Python 2 is incorrect, since it says:

For the list and tuple types, x in y is true if and only if there exists an index i such that x == y[i] is true.

You could raise a documentation bug about this if you want, although it is a pretty esoteric issue so I doubt it will be high on anyone's priority list.

thefourtheye
  • 233,700
  • 52
  • 457
  • 497
BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • 2
    Isn't that documentation in 3.x wrong for e.g. `set`, as the OP documents? That checks hashes, not identity. – jonrsharpe Apr 17 '15 at 07:15
  • @jonrsharpe: That's true. I was focusing on the anomalous behavior of the list/tuple case. Using a randomized hash like that is even more pathological than a non-self-equal object, though. – BrenBarn Apr 17 '15 at 07:18
  • Now I'm confused - I was running on Code2Go for iPhone, where `p in {p}` was `True` for 2.7.9 and 3.4.2. Now I'm on a Yosemite MacBook, where `p in {p}` is `False` for 2.7.9 and 3.4.3. – jonrsharpe Apr 17 '15 at 07:30
  • @jonrsharpe: An object that doesn't have a constant hash is broken. Python does not guard against every silly thing that can be done wrong. – Ethan Furman Apr 17 '15 at 07:31
  • @EthanFurman I have no problem with that, but I'd expect consistent behaviour as documented, at least. – jonrsharpe Apr 17 '15 at 07:32
  • @jonrsharpe: I think any documentation should be considered accurate so far as correct objects are used -- if someone deliberately creates a broken object, then they own all the pieces ;) . Littering documentation with notes about how this or that may fail when presented with this or that broken thing would be a huge drain on volunteer time, as well as creating a bunch of noise for everybody else who /isn't/ creating broken objects. – Ethan Furman Apr 17 '15 at 07:51
  • 1
    @EthanFurman I agree with that up to a point, but *"`x in y` is equivalent to `any(x is e or x == e for e in y)`"* seems pretty unambiguous - identity is checked first, so `p in {p}` should be `True` *however it is implemented* (as you can't override identity). Python is a very dynamic language, and assuming pretty much anything about the implementation of `x` is foolish, but if that's **not** actually equivalent to the underlying implementation it should at least be caveated. – jonrsharpe Apr 17 '15 at 07:54
  • @EthanFurman: Sure, but you need to document what "broken" means. Is there documentation saying that not having a constant hash means broken? – BrenBarn Apr 17 '15 at 08:01
  • @BrenBarn: Yup, finally found it: https://docs.python.org/3/glossary.html#term-hashable – Ethan Furman Apr 17 '15 at 08:05
  • 2
    @jonrsharpe: [bug created](http://bugs.python.org/issue23987) -- at least the resolution will be documented. – Ethan Furman Apr 17 '15 at 08:10