Slow equality evaluation for identical objects (x == x)

Question

Is there any reason x == x is not evaluated quickly? I was hoping __eq__ would check if its two arguments are identical, and if so return True instantly. But it doesn't do it:

s = set(range(100000000))
s == s # this doesn't short-circuit, so takes ~1 sec

For built-ins, x == x always returns True I think? For user-defined classes, I guess someone could define __eq__ that doesn't satisfy this property, but is there any reasonable use case for that?

The reason I want x == x to be evaluated quickly is because it's a huge performance hit when memoizing functions with very large arguments:

from functools import lru_cache
@lru_cache()
def f(s):
    return sum(s)
large_obj = frozenset(range(50000000))
f(large_obj) # this takes >1 sec every time

Note that the reason @lru_cache is repeatedly slow for large objects is not because it needs to calculate __hash__ (this is only done once and is then hard-cached as pointed out by @jsbueno), but because the dictionary's hash table needs to execute __eq__ every time to make sure it found the right object in the bucket (equality of hashes is obviously insufficient).

UPDATE:

It seems it's worth considering this question separately for three situations.

1) User-defined types (i.e., not built-in / standard library).

As @donkopotamus pointed out, there are cases where x == x should not evaluate to True. For example, for numpy.array and pandas.Series types, the result is intentionally not convertible to boolean because it's unclear what the natural semantics should be (does False mean the container is empty, or does it mean all items in it are False?).

But here, there's no need for python to do anything, since the users can always short-circuit x == x comparison themselves if it's appropriate:

def __eq__(self, other):
  if self is other:
    return True
  # continue normal evaluation

2) Python built-in / standard library types.

a) Non-containers.

For all I know the short-circuit may already be implemented for this case - I can't tell since either way it's super fast.

b) Containers (including str).

As @Karl Knechtel commented, adding short-circuit may hurt total performance if the savings from short-circuit are outweighed by the extra overhead in cases where self is not other. While theoretically possible, even in that case the overhead is a small in relative terms (container comparison is never super-fast). And of course, in cases where short-circuit helps, the savings can be dramatic.

BTW, it turns out that str does short-circuit: comparing huge identical strings is instant.

'x == x' is NOT always True for built-ins - in particular, NaN (Not A Number) is never equal to anything, not even itself. — jasonharper, Aug 05 '16 at 00:40
My best guess is "because it would slow down every other comparison, and is thus not optimizing for the common case". But when it's a matter of O(1) vs O(N), and there are practical reasons for the "rare" case to come up frequently in some context like this, I'd side with you in saying an exception should be made. NaN is an exception to that exception, though... it's entirely arguable that it shouldn't be unique and that other user-defined classes might have their own reasons. — Karl Knechtel, Aug 05 '16 at 00:40
@Veedrac I would, but not in `lru_cache` and not inside the dictionary hash table algorithm.. — max, Aug 05 '16 at 00:58
`lru_cache` saves by default only the 128 least recently accessed results. If you have a great number of results to cache: Have you tried to call `lru_cach(maxsize=None)`? — chapelo, Aug 05 '16 at 01:05
@Veedrac not too easy. If you know of a clean way to do it that *always* works, please add your answer [here](http://stackoverflow.com/questions/12725871/memoization-when-arguments-can-be-very-large/12726843#12726843), I'd really appreciate. — max, Aug 05 '16 at 02:58
@max I don't understand what you mean by "*always* works". What's problematic about memoizing based on the wrapped values? — Veedrac, Aug 05 '16 at 15:04

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

5

As you say, someone could quite easily define an __eq__ that you personally don't happen to approve of ... for example, the Institute of Electrical and Electronics Engineers might be so foolish as to do that:

>>> float("NaN") == float("NaN")
False

Another "unreasonable use case":

>>> bool(numpy.ma.masked == numpy.ma.masked)
False

Or even:

>>> numpy.arange(10) == numpy.arange(10)
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

which has the audacity to not even be convertible to bool!

So there is certainly practical scope for x == x to not automagically be short-circuited to be true.

Going Off Course

However the following is perhaps a good question:

Why doesn't set.__eq__ check for instance identity?

Well, one might think ... because a set S might contain NaN and since NaN cannot equal itself then surely such a set S cannot equal itself? Investigating:

>>> s = set([float("NaN")])
>>> s == s
True

Hmm, that's interesting, especially since:

>>> {float("NaN")} == {float("NaN")}
False

This behaviour is due to Python's desire for sequences to be reflexive.

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 05 '16 at 00:40

donkopotamus

22,114
2
48
60

+1 I don't know how I forgot about NaN, I even complained about this very thing in my own [SO answer with the largest number of downvotes](http://stackoverflow.com/a/10059796/336527)... /facepalm – max Aug 05 '16 at 00:56
1

To answer the question you pose: `[float('NaN')] == [float('NaN')]` – Ryan Haining Aug 05 '16 at 00:58
3

@RyanHaining Amusingly, I was just writing about your suggestion ... you should investigate: `L = [float('NaN')]; L == L` versus `[float('NaN')] == [float('NaN')]` – donkopotamus Aug 05 '16 at 00:59
@donkopotamus now I'm confused, perhaps it's checking with `is` after all – Ryan Haining Aug 05 '16 at 01:16
looks like a whole new question – Ryan Haining Aug 05 '16 at 01:17
@RyanHaining It does ... [here](http://stackoverflow.com/questions/38779705/comparison-of-collections-containing-nan) it is – donkopotamus Aug 05 '16 at 01:18
But given that Python's container types force reflexive equality on their contents, why don't they optimize reflexive equality tests on themselves? Is there any situation where `x == x` is `False` when `x` is an instance of one of the builtin container types? I don't know of any such situation, even when NaNs are involved. – Blckknght Aug 05 '16 at 02:35

Slow equality evaluation for identical objects (x == x)

1 Answers1

Going Off Course