It seems a common and quick way to create a stock __hash__()
for any given Python object is to return hash(str(self))
, if that object implements __str__()
. Is this efficient, though? Per this SO answer, a hash of a tuple of the object's attributes is "good", but doesn't seem to indicate if it's the most efficient for Python. Or would it be better to implement a __hash__()
for each object and use a real hashing algorithm from this page and mixup the values of the individual attributes into the final value returned by __hash__()
?
Pretend I've implemented the Jenkins hash routines from this SO question. Which __hash__()
would be better to use?:
# hash str(self)
def __hash__(self):
return hash(str(self))
# hash of tuple of attributes
def __hash__(self):
return hash((self.attr1, self.attr2, self.attr3,
self.attr4, self.attr5, self.attr6))
# jenkins hash
def __hash__(self):
from jenkins import mix, final
a = self.attr1
b = self.attr2
c = self.attr3
a, b, c = mix(a, b, c)
a += self.attr4
b += self.attr5
c += self.attr6
a, b, c = final(a, b, c)
return c
Assume the attrs in the sample object are all integers for simplicity. Also assume that all objects derive from a base class and that each objects implements its own __str__()
. The tradeoff in using the first hash is that I could implement that in the base class as well and not add additional code to each of the derived objects. But if the second or third __hash__()
implementations are better in some way, does that offset the cost of the added code to each derived object (because each may have different attributes)?
Edit: the import
in the third __hash__()
implementation is there only because I didn't want to draft out an entire example module + objects. Assume that import
really happens at the top of the module, not on each invocation of the function.
Conclusion: Per the answer and comments on this closed SO question, it looks like I really want the tuple hash implementation, not for speed or efficiency, but because of the underlying duality of __hash__
and __eq__
. Since a hash value is going to have a limited range of some form (be it 32 or 64 bits, for example), in the event you do have a hash collision, object equality is then checked. So since I do implement __eq__()
for each object by using tuple comparison of self/other's attributes, I also want to implement __hash__()
using an attribute tuple so that I respect the hash/equality nature of things.