4

I would like to write a class that can be used as a key in a hashable collections (e.g. in a dict). I know that user classes are by default hashable, but using id(self) would be the wrong thing here.

My class holds a tuple as member variable. Deriving from tuple doesn't seem like an option because in my constructor I don't get the same kind of arguments as a tuple constructor. But perhaps that's not a limitation?

What I need is basically the hash of a tuple the way a real tuple would give it.

hash(self.member_tuple) does just that.

The idea here is that two tuples can be equal without their id being equal.

If I implement my __cmp__() as follows:

def __cmp__(self, other):
    return cmp(self, other)

will this automatically resort to hash(self) for the comparison? ... or should I implement it as follows:

def __cmp__(self, other):
    return cmp(self.member_tuple, other)

My __hash__() function is implemented to return the hash of the held tuple, i.e.:

def __hash__(self):
    return hash(self.member_tuple)

Basically, how do __cmp__() and __hash__() interact? I don't know whether in __cmp__() the other will already be a hash or not and whether I should compare against "my" hash (which would be the one of the held tuple) or against self.

So which one is the right one?

Can anyone shed any light on this and possibly point me to documentation?

0xC0000022L
  • 20,597
  • 9
  • 86
  • 152

1 Answers1

6

I'd not use __cmp__ and stick to using __eq__ instead. For hashing that is enough and you don't want to extend to being sortable here. Moreover, __cmp__ has been removed from Python 3 in favour of the rich comparison methods (__eq__, __lt__, __gt__, etc.).

Next, your __eq__ should return True when the member tuples are equal:

def __eq__(self, other):
    if not isinstance(other, ThisClass):
        return NotImplemented
    return self.member_tuple == other.member_tuple

Returning the NotImplemented singleton when the type of the other object is not the same is good practice because that'll delegate the equality test to the other object; if it doesn't implement __eq__ or also returns NotImplemented Python will fall back to the standard id() test.

Your __hash__ implementation is spot-on.

Because a hash is not meant to be unique (it is just a means to pick a slot in the hash table), equality is then used to determine if a matching key is already present or if a hash collision has taken place. As such __eq__ (or __cmp__ if __eq__ is missing) is not called if the slot to which the object is being hashed is empty.

This does mean that if two objects are considered equal (a.__eq__(b) returns True), then their hash values must be equal too. Otherwise you could end up with a corrupted dictionary, as Python will no longer be able to determine if a key is already present in the hash table.

If both your __eq__ and __hash__ methods are delegating their duties to the self.member_tuple attribute, you are maintaining that property; you can trust the basic tuple type to have implemented this correctly.

See the glossary definition of hashable and the object.__hash__() documentation. If you are curious, I've written about how the dict and set types work internally:

Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • thanks. The last paragraph confuses me, though. How do I guarantee that the hash value is the same? Can I assume that when hashing a `tuple`? I cannot possibly know whether the hash of any other given `tuple` is the same as the hash of the member tuple, can I? – 0xC0000022L Dec 12 '14 at 13:13
  • 1
    @0xC0000022L: you are delegating the hash function to the tuple the same way you are delegating the equality test, so you are maintaining that property just fine. If tuple ever started producing different hash values even for tuples that are equal then that'd be a far bigger problem and you can be assured that that'd be fixed *very* promptly. – Martijn Pieters Dec 12 '14 at 13:15