When does eq gets called using hash()?

Question

As mentioned here,

Below code,

class Person(object):
     def __init__(self, name, ssn, address):
         self.name = name
         self.ssn = ssn
         self.address = address
     def __hash__(self):
         print('in hash')
         return hash(self.ssn)
     def __eq__(self, other):
         print('in eq')
         return self.ssn == other.ssn

bob = Person('bob', '1111-222-333', None)

jim = Person('jim bo', '1111-222-333', 'sf bay area')


dmv_appointments = {}
print('calling hash')
dmv_appointments[bob] = 'tomorrow'
print('calling hash')
print(dmv_appointments[jim])
print('calling hash again')
print(dmv_appointments[bob])

Output:

calling hash
in hash
calling hash
in hash
in eq
tomorrow
calling hash again
in hash
tomorrow

Question:

Why __eq__ gets called on accessing jimbut not on bob?

My guess is that it's first checking the hash, then _identity_ (being faster), and only _then_ equality. Thus, when you look up `bob` it does not have to check `__eq__` to see whether it's the right entry, but for `jim` it does. — tobias_k, Jun 13 '17 at 20:37
Related question: https://stackoverflow.com/q/40917986/1639625 A comment says something about how it's done in the C implementation. — tobias_k, Jun 13 '17 at 20:43
@tobias_k: Yeah, that's a CPython (the Python reference interpreter) implementation detail due to [using `PyObject_RichCompareBool` as the standard method for comparing objects at the C layer](https://docs.python.org/3/c-api/object.html#c.PyObject_RichCompareBool). Using that function for equality checking is effectively equivalent to `return x is y or x == y` at the Python level. — ShadowRanger, Jun 13 '17 at 21:11
As an aside, your class uses `ssn` as identity as far as `__hash__` and `__eq__` are concerned so your test data using the same `ssn` for two objects is an error case. The `dict` cannot hold both of these objects at the same time. It would make sense to check for this in the constructor and error out. — tdelaney, Jun 13 '17 at 22:52

Willem Van Onsem · Accepted Answer · 2017-06-13T21:11:31.687

Short answer: a dictionary lookup first does a (cheap) reference equality check (x is y) when searching a bucket, and only if that fails, a (more expensive) equality check (x == y) is done.

Scenario

The __hash__ function does not call __eq__ internally. Given you construct bob and jim, no such methods are called.

Next you associate bob with 'tomorrow'. In order to know in which bucket of the dictionary, you have to store bob, you calculate the hash. Now once you have done that we store bob (and the value in the correct bucket).

Next we want to obtain jim. In order to know in which bucket jim resides, we calculate the hash. Next we start searching in the bucket. The bucket will contain bob. We first perform a reference check (jim is bob) but that fails, so then we fallback on the equality check. That check succeeds, so we return the value corresponding with bob: 'tomorrow'.

The same scenario happens when we want to look for bob: we calculate the hash, fetch the bucket. Perform a reference check on bob is bob, and that one succeeds. So we do not need a (probably more expensive equality check). We simply return the value 'tomorrow'.

Reference checks

The fact that a reference check is done first can be proven with the following (unhealthy) code:

class Person(object):
     def __init__(self, name, ssn, address):
         self.name = name
         self.ssn = ssn
         self.address = address
     def __hash__(self):
         print('in hash')
         return hash(self.ssn)
     def __eq__(self, other):
         print('in eq')
         return False

Here we return always False for equality. So even:

>>> bob == bob
in eq
False
>>> bob is bob
True

bob is not equal to itself (this is actually not good design, since for a dictionary, it is a contract that an object is equal to itself: a good equality relation is reflexive, symmetrical and transitive). Nevertheless, if we associate bob with 'tomorrow', we are still able to fetch the value associated with bob:

>>> dmv_appointments = {}
>>> dmv_appointments[bob] = 'tomorrow'
in hash
>>> dmv_appointments[bob]
in hash
'tomorrow'

It's not generally required that all objects equal themselves. `float('nan')` and `decimal.Decimal('nan')` don't, for example, and those types come with Python. It *is* required that `==` is an equivalence relation for dict keys, so using one of those `Person` objects as a dict key breaks the contract for dict keys. — user2357112, Jun 13 '17 at 21:06
@user2357112: yeah there are some exceptions, because usually the see `NaN` as a "corrupt" result. Two `nan`s that originate from different computations are usually not equal. But in general reflexivity, symmetry and transitivity are properties of a sound equality relation. — Willem Van Onsem, Jun 13 '17 at 21:10

MSeifert · Answer 2 · 2017-06-13T21:39:11.847

To answer the title:

When does __eq__ gets called using hash()?

Never.

The other question:

Why __eq__ gets called on accessing jim but not on bob?

That's more complicated. To understand that you need to know how a dictionary is implemented. Assuming CPython it will be a table containing a hash column, a key column and a value column:

hash     | key       | value
-----------------------------------------
    -    |      -    |        -
-----------------------------------------
    -    |      -    |        -

It will have a certain size but it won't be big enough to contain every possible hash value, so it will calculate the position based on the hash. For example if you add bob it could have (string hashes are randomized in certain CPython versions so the actual result will differ) a hash of 7475314405837642385. Assuming the dictionary has an actual size of 2 (in reality it will be bigger, but that would unnecessarily waste space in the answer) it just takes the modulo, so it will place it in 7475314405837642385 % 2 == 1:

hash     | key       | value
-----------------------------------------
    -    |      -    |        -
-----------------------------------------
747...385|    bob    |    'tomorrow'

When you want to look up a key it

always calculate the hash of the lookup
then it will calculate the position.
Then it compares the hash of the lookup to the hash saved in that position
if the hashes are equal then it will compare the lookup and the saved key with PyObject_RichCompareBool. That will:
- first check for identity: lookup is key
- if that is False it will check lookup == key

So in case you lookup bob:

hash: 7475314405837642385
position: 7475314405837642385 % 2 -> 1
found an entry, so compare the hashes: 7475314405837642385 == 7475314405837642385
that was equal so check for identity: bob is bob -> True

So it returns 'tomorrow' without an equality check. In the second case it checks for jim:

hash: 7475314405837642385
position: 7475314405837642385 % 2 -> 1
found an entry, so compare the hashes: 7475314405837642385 == 7475314405837642385
that was equal so check for identity: jim is bob -> False
check for equality jim == bob -> True

So it returns 'tomorrow'.

This is just an approximation of the actual implementation (it's missing some details). It gets more complicated if the hashes are not equal or the lookup is not key and lookup != key but these are not really important to understand the observed behavior you questioned.

However, I really need to say this: What you're doing is really dangerous because your class isn't immutable. You could accidentally make the saved dictionary entry unavailable to you:

dmv_appointments = {bob: 1}
bob.ssn = '1'     # changing ssn changes the hash!
dmv_appointments[bob]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-35-3920ada7bab1> in <module>()
     15 dmv_appointments = {bob: 1}
     16 bob.ssn = '1'
---> 17 dmv_appointments[bob]

KeyError: <__main__.Person object at 0x000001BD5DDCC470>

(It could still work in case the new hash is equal to the "old" hash, but that would be rather accidental).

That's because while you alter the hash of your instance - the dictionary won't update the saved hash because it assumes all keys are immutable! So the dictionary would either assume it would be saved in another position or if the position would (miraculously) be the same then it would fail in the step where it contains the actual hashes.

When does eq gets called using hash()?

2 Answers2

Scenario

Reference checks

Linked

When does __eq__ gets called using hash()?

2 Answers2

Scenario

Reference checks

Linked

When does eq gets called using hash()?