How does hash-table in set works in python?

Question

As far as I know, set in python works via a hash-table to achieve O(1) look-up complexity. While it is hash-table, every entry in a set must be hashable (or immutable). So This peace of code raises exception:

>>> {dict()}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'

Because dict is not hashable. But we can create our own class inherited from dict and implement the __hash__ magic method. I created my own in this way:

>>> class D(dict):
...     def __hash__(self):
...             return 3
...

I know it should not work properly but I just wanted to experiment with it. So I checked that I can now use this type in set:

>>> {D()}
{{}}
>>> {D(name='ali')}
{{'name': 'ali'}}

So far so good, but I thought that this way of implementing the __hash__ magic method would screw up the look up in set. Because every object of the D has the same hash value.

>>> d1 = D(n=1)
>>> d2 = D(n=2)
>>> hash(d1), hash(d2)
(3, 3)
>>> 
>>> {d1, d2}
{{'n': 2}, {'n': 1}}

But the surprise for me was this:

>>> d3 = D()
>>> d3 in {d1, d2}
False

I expected the result to be True, because hash of d3 is 3 and currently there are values in our set with the same hash value. How does the set works internally?

They hash to the same bucket, which will make your hash-lookups all collisions, **but they aren't equal, i.e. `__eq__`**, so they won't behave incorrectly. Of course, these are mutable objects, which will create *other problems* as well — juanpa.arrivillaga, May 19 '22 at 18:12
The hash is only half the deal, the other halve is equality between the instances. And you have not implemented any equality. — luk2302, May 19 '22 at 18:12
Most algorithms that use hash table have to handle collisions — usually by storing a list of objects that have the same hash and doing a linear search checking for equality when one occurs. — martineau, May 19 '22 at 18:12
"I expected the result to be True, because hash of d3 is 3 and currently there are values in our set with the same hash value. How does the set works internally?" Like a hash set / hash table. But you are making the mistake of thinking that a hash table decides whether something is or isn't in it by using *only* the hash value, it doesn't. — juanpa.arrivillaga, May 19 '22 at 18:13

score 4 · Accepted Answer · answered May 19 '22 at 18:14

To be usable in sets and dicts, a __hash__ method must guarantee that if x == y, then hash(x) == hash(y). But that's a one-sided implication. It's not at all required that if hash(x) == hash(h) then x == y must be true. Indeed, that's impossible to achieve in general (for example, there are an unbounded number of distinct Python ints, but only a finite number of hash codes - there must be distinct ints that have the same hash value).

That your hashes are all the same is fine. They only tell the set/dict where to start looking. All objects in the container with the same hash are then compared, one by one, for equality, until success, or until all such objects have been tried without success.

However, while making all hashes the same doesn't hurt correctness, it's a disaster for performance: it effectively turns the set/dict into an exceptionally slow way to do an O(n) linear search.

It essentially turns it into a list where you do `if new_element not in the_set: the_set.append(new_element)` — Barmar, May 19 '22 at 18:17
@Barmar, it's worse than that. Same `O(n)` worst-case behavior, but at least a linear search through a list accesses memory in predictable order. The hash collision strategies for sets/dicts leap all over the container from one try to the next. — Tim Peters, May 19 '22 at 18:21

How does hash-table in set works in python?

1 Answers1

Linked