How does hashing work for python sets

Question

I am completely familiar on how hashtables and hashes work but I am trying to fully understand how the O(1) completely comes from.

set1 = {'s','t'}
print('x' in set1)
print('s' in set1)
set2 = {'s'}
print('s' in set2)

I am told that to check if 's' is in set1, if will check the memory allocation of the hash of 's', and check if it is in set1 in O(1) and return the boolean. Therefore two O(1) operations, but my question is: How does the hashes actually work indepth. What I mean by this is, when you hash 's', does that hash have something like set1 and set2 and you're checking if set1 is either set1 or set2, or does each set have a different hash of 's' and you're checking the hash of 's' for each different set.

I'm a bit confused by your question... Can you please try to improve it? — Willem Van Onsem, Jun 30 '16 at 12:59
Hashing `'s'` means doing `hash('s')`. `set1` and `set2` have nothing to do with that. The `set` type *uses* the result of that operation to obtain an index in its internal table. — Bakuriu, Jun 30 '16 at 13:05
All hashable objects have a hash method. The set uses the hash to, among other things, determine if the object is unique in the set. When you test `'x' in set1` -- You are testing if `'x'` is a member of `set1` — sytech, Jun 30 '16 at 13:14
" In a set, Python keeps track of each hash, and when you if if x in values:, Python will get the hash-value for x, look that up in an internal structure and then only compare x with the values that have the same hash as x." This is the sentence I was looking for in my possible duplicate page, but I do not fully understand it. Could someone re-word it maybe? How do you only compare things with the same hash? That's like comparing everything in the set like a list but only checking if theyre the same hash. o > O — memelord23, Jul 01 '16 at 00:59

sytech · Answer 1 · 2020-06-26T01:10:10.610

An (immutable) object will always have the same hash for its entire lifetime. The hash will be the same between set1 and set2 -- This allows for operations like union and intersection to take place between sets, as well as determining if a member is unique within the set, among other things.

>>> first_dict = {"s":"s".__hash__(), "t":"t".__hash__()}
>>> second_dict = {"s":"s".__hash__()}
>>> first_dict
{'s': -638743784, 't': 711199131}
>>> second_dict
{'s': -638743784}

You can have identical strings that are different objects in memory. But since they're identical strings, they will hash the same.

>>> id(a)
7858120
>>> id(b)
7858624
>>> a.__hash__()
-1117164856
>>> b.__hash__()
-1117164856

An object is hashable if it has a hash value which never changes during its lifetime (it needs a hash() method), and can be compared to other objects (it needs an eq() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id().

Moon Cheesez · Answer 2 · 2016-06-30T13:27:48.170

2

When Python creates a string, it is based on the string value. Therefore, both 's' and 's' assigned to different variables have equal hashes. Hence:

>>> a = 's'
>>> b = 's'
>>> hash(a) == hash(b)
True

However, note that they may not always point to the same string in memory:

>>> a = 'this is some really long string'
>>> b = 'this is some really long string'
>>> id(a)
47002352L
>>> id(b)
47002608L

edited Jun 30 '16 at 13:27

answered Jun 30 '16 at 13:21

Moon Cheesez

2,489
3
24
38

score 1 · Answer 3 · edited May 23 '17 at 12:22

1

I suggest you read this previous question, It will explain in details the hashing in Python.

edited May 23 '17 at 12:22

Community

1
1

answered Jun 30 '16 at 13:15

Gal Dreiman

3,969
2
21
40

score 0 · Answer 4 · answered Jun 30 '16 at 13:09

The Python dict is essentially a hash table. Essentially the keys are transformed into table positions by a hashing function; since multiple keys may has to the same value there has to be some mechanism for detecting collisions (multiple keys with the same has) and moving down a sequence of alternate entries until the correct key is located.

For a dict the key cell, when found, is associated with a value. For a set there are no values: a key either belongs to the set or it doesn't.

The hashing behaviour therefore allows most possible values to be eliminated in constant time (assuming collisions aren't problematically frequent, and Python takes care to reallocate storage whenever that might happen).

How does hashing work for python sets

4 Answers4

Linked