Towards understanding dictionaries

Question

I am required to use multiple hashtables, so in c++, I would normally use an std::unordered_map. So far I can understand that I can use a dictionary in Python, so let's assume the following code:

my_dict_1 = {}
my_dict_1['foo'] = 1
my_dict_2 = {}
my_dict_2['foo'] = 2

Will the two dictionaries be using different hash functions (notice that the key is the same), thus they can be considered two different hash tables (I mean that they will actually store the data differently)?

EDIT:

Yes the dictionaries are two different objects of course, but the question is about the technique that they will use to store the data!

@JohnColeman thanks for the upvote. Can you please point to some documentation saying so? I am very new in [tag:Python] and it's not that clear for me. :/ What you are saying is that every dictionary has its own hashfunction, right? I wonder if they are *independent* too, but this is for later.. :) — gsamaras, May 08 '16 at 13:37
What do you mean by "different hash functions"? A hash function is a piece of code (and it is implementation-defined). They are definitely different dictionaries (hash tables), though (since they are by construction two different objects). It has not really anything to do with dictionaries specifically, btw. — import this, May 08 '16 at 13:48
I am mean the hash function every dictionary will use is different from the other. If use a dictionary as a hashtable and then another one, I would like to know that they will store the data differently, because of a different hush function. Hope it is clearer now @importthis. — gsamaras, May 08 '16 at 13:50
I'm really curious why you ask about this - why would the hash function be different, and how would it be different? The point of the hash is to quickly test if two keys are the same, what would be a benefit of having a different hash for each dictionary? — TessellatingHeckler, May 08 '16 at 15:23
@TessellatingHeckler because I was hoping to use them for LSH: http://stackoverflow.com/questions/37089971/confusion-in-hashing-in-lsh — gsamaras, May 08 '16 at 19:32

score 3 · Accepted Answer · edited May 08 '16 at 14:01

3

A simple Python shell experiment to show that different dictionaries can use the same key:

>>> my_dict_1 = {'foo':1}
>>> my_dict_2 = {'foo':2}
>>> my_dict_1,my_dict_2
({'foo': 1}, {'foo': 2})

This is a good discussion of how it is implemented. The key point is that each dictionary is allocated its own portion of memory (which can of course grow as needed). The exact same hash function is used for both dictionaries, but is being used to probe different areas in memory.

edited May 08 '16 at 14:01

gsamaras

71,951
46
188
305

answered May 08 '16 at 13:44

John Coleman

51,337
7
54
119

The exact same hash function? Damn, then that means that if we feed both dictionaries with very same data, they will store them in the very same way (in different memory locations, but I do not care about that), thank you John!! – gsamaras May 08 '16 at 13:54
2

@gsamaras: I just browsed through the source code, and it certainly seems the hash is only calculated through the *key*, and not, for example, salted with the dict name or something like that. Also, "Hashes for integers are the same as the integer (hash(100) == 100), but that is an implementation detail and could change without warning." (https://mail.python.org/pipermail/tutor/2003-August/024555.html) – Jongware May 08 '16 at 22:34

bwangel · Answer 2 · 2016-05-08T14:21:20.310

id(...)

id(object) -> integer

Return the identity of an object. This is guaranteed to be unique among simultaneously existing objects. (Hint: it's the object's memory address.)

Above is the id doc string, it says that object's identity is object's memory address, so we can use the id function to see the variable's memory address:

In your program, I can see the address like this:

def main():
    my_dict_1 = {}
    my_dict_1['foo'] = 1
    my_dict_2 = {}
    my_dict_2['foo'] = 2
    print(hex(id(my_dict_1['foo'])))
    print(hex(id(my_dict_2['foo'])))

if __name__ == '__main__':
    main()

This program output this:

0x9a33e0
0x9a3400

We can see that my_dict_1['foo'] and my_dict_2['foo'] have different memory address.

So I think the two dicts should use the same hash function, but the variable's memory address should be the sum of the hash value and a base value. In this way, the two variable will be stored in the different memory areas.

There is an answer mentioning pretty much the same thing. It is deleted now, thus you can't see it. Thanks though, +1 for an analytic answer, but my edit is really clear, I think. — gsamaras, May 08 '16 at 14:20
These are just the addresses of the integers `1` and `2`, independent of whether they're referred to from one or several dicts or not. (Try it: put `print(hex(id(1)))` into the same program.) — das-g, May 08 '16 at 20:02

Towards understanding dictionaries

2 Answers2

Linked