0

usage

(1)

x = {}
y = 1
x[y] = "a" (1)
y = [1, 2]
x[y] = "a" (2)

The (2) in the above code is definitely wrong since list is an unhashable type. I googled why list is unhashable? It seems that the logic is that if you were to look something up in a dictionary, you would except the entries' names not to change. Since list is mutable we cannot hash it. Keys must be immutable.

Here comes my question! The (1) in the above code is totally Okay. But actually, y in (1) is also mutable.

(2)

class TreeNode:
    def __init__(self, val):
        self.val = val
        self.left = None
        self.right = None
x = {}
y = TreeNode(1)
x[y] = "a"

For the above code, how does the Python compute the hash code for the TreeNode object? I've checked the document. TreeNode inherits from Object which means it also inherits the __hash__ function. But I failed to find the description about default calculation of hash code. I thought the compiler may use each object's id?

Detail Implementation

Based on this article, Cpython adopts open addressing to solve the collision instead of using linked list. The reason is that using a linked list to store the pairs having the same hash but it would increase the lookup time e.g. not O(1) anymore. I can understand this part. But it seems that using open addressing also increase the lookup time since you need to prob an available slot. The author in the article said: "adding a key/value pair might take more time because of the probing but the lookup will be O(1) and this is the desired behavior."

Can anyone explain why open addressing make the lookup still O(1)? Or we can change this question a little bit. What's the prons and cons of comparing open addressing and linked list used for solving collisions?

Fihop
  • 3,127
  • 9
  • 42
  • 65
  • "But actually, y in (1) is also mutable." Are you saying that integers are mutable? I don't think that's true. Can you provide an example where an integer mutates? Remember, assignment doesn't count as mutation, that's just rebinding a new value to a name. – Kevin Aug 05 '15 at 16:57
  • @Kevin, you are right! I got it – Fihop Aug 05 '15 at 16:57
  • "We could use a linked list to store the pairs having the same hash but it would increase the lookup time e.g. not O(1) anymore." - I disagree with this. – matino Aug 05 '15 at 16:58
  • Related: [are user defined classes mutable](http://stackoverflow.com/q/12076445) – Martijn Pieters Aug 05 '15 at 16:59
  • *"I thought the compiler may use each object's id?"* - try `hash(y) == id(y)` and find out. – jonrsharpe Aug 05 '15 at 16:59
  • @Fihop You can't hash mutable types as it defeats the purpose. And integer is not a mutable type. – Ozgur Vatansever Aug 05 '15 at 17:00
  • @jonrsharpe, they are different. I've tried. Thanks – Fihop Aug 05 '15 at 17:00
  • @MartijnPieters - it's not a duplicate, the question is different - "Can anyone explain why open addressing make the lookup still O(1)?" – matino Aug 05 '15 at 17:00
  • 3
    Last but not least: please stick to **one** question per post. Your second question can be answered by several other pre-existing questions on Stack Overflow. Open addressing is a *best case* O(1) implementation, worst case O(K) where K is the number of slots. In practice you'll have very few collisions, which is a *average case* O(1) solution. – Martijn Pieters Aug 05 '15 at 17:01
  • @matino: there are two different questions in the post. If the OP wanted to have just the 'detailed' part answered then the question should be narrowed down to just that part. Not that it isn't a duplicate. – Martijn Pieters Aug 05 '15 at 17:02
  • @matino: the other option was to close the question as *too broad*, but because there is a handy duplicate available for at least one part I opted to close it as a duplicate instead. – Martijn Pieters Aug 05 '15 at 17:04
  • @jonrsharpe: the [default hash implementation](https://hg.python.org/cpython/file/6987a9c7dde9/Objects/object.c#l1080) uses the same *information* as the `id()` function, but the value produced is not the same (`id()` returns the value produced by [`PyLong_FromVoidPtr()`](https://hg.python.org/cpython/file/6987a9c7dde9/Objects/longobject.c#l769)) – Martijn Pieters Aug 05 '15 at 17:10
  • @MartijnPieters I interpreted the OP as saying that it used the id *as the hash*, rather than *for calculating the hash*, that may have been wrong! – jonrsharpe Aug 05 '15 at 17:12
  • Your edit only *broadened* the scope even more. This is not a site to discuss the pros and cons of various hash map implementation choices. – Martijn Pieters Aug 05 '15 at 17:31

0 Answers0