Is there any reason x == x
is not evaluated quickly? I was hoping __eq__
would check if its two arguments are identical, and if so return True instantly. But it doesn't do it:
s = set(range(100000000))
s == s # this doesn't short-circuit, so takes ~1 sec
For built-ins, x == x
always returns True I think? For user-defined classes, I guess someone could define __eq__
that doesn't satisfy this property, but is there any reasonable use case for that?
The reason I want x == x
to be evaluated quickly is because it's a huge performance hit when memoizing functions with very large arguments:
from functools import lru_cache
@lru_cache()
def f(s):
return sum(s)
large_obj = frozenset(range(50000000))
f(large_obj) # this takes >1 sec every time
Note that the reason @lru_cache is repeatedly slow for large objects is not because it needs to calculate __hash__
(this is only done once and is then hard-cached as pointed out by @jsbueno), but because the dictionary's hash table needs to execute __eq__
every time to make sure it found the right object in the bucket (equality of hashes is obviously insufficient).
UPDATE:
It seems it's worth considering this question separately for three situations.
1) User-defined types (i.e., not built-in / standard library).
As @donkopotamus pointed out, there are cases where x == x
should not evaluate to True. For example, for numpy.array
and pandas.Series
types, the result is intentionally not convertible to boolean because it's unclear what the natural semantics should be (does False mean the container is empty, or does it mean all items in it are False?).
But here, there's no need for python to do anything, since the users can always short-circuit x == x
comparison themselves if it's appropriate:
def __eq__(self, other):
if self is other:
return True
# continue normal evaluation
2) Python built-in / standard library types.
a) Non-containers.
For all I know the short-circuit may already be implemented for this case - I can't tell since either way it's super fast.
b) Containers (including str
).
As @Karl Knechtel commented, adding short-circuit may hurt total performance if the savings from short-circuit are outweighed by the extra overhead in cases where self is not other
. While theoretically possible, even in that case the overhead is a small in relative terms (container comparison is never super-fast). And of course, in cases where short-circuit helps, the savings can be dramatic.
BTW, it turns out that str
does short-circuit: comparing huge identical strings is instant.