135

When implementing a class with multiple properties (like in the toy example below), what is the best way to handle hashing?

I guess that the __eq__ and __hash__ should be consistent, but how to implement a proper hash function that is capable of handling all the properties?

class AClass:
  def __init__(self):
      self.a = None
      self.b = None

  def __eq__(self, other):
      return other and self.a == other.a and self.b == other.b

  def __ne__(self, other):
    return not self.__eq__(other)

  def __hash__(self):
      return hash((self.a, self.b))

I read on this question that tuples are hashable, so I was wondering if something like the example above was sensible. Is it?

Community
  • 1
  • 1
abahgat
  • 13,360
  • 9
  • 35
  • 42
  • 5
    Just make sure to use `hash()` on a tuple with exactly the elements that are compared in `__eq__()` and friends (exactly as you did) and you're good to go. – Feuermurmel Apr 22 '14 at 20:52

3 Answers3

101

__hash__ should return the same value for objects that are equal. It also shouldn't change over the lifetime of the object; generally you only implement it for immutable objects.

A trivial implementation would be to just return 0. This is always correct, but performs badly.

Your solution, returning the hash of a tuple of properties, is good. But note that you don't need to list all properties that you compare in __eq__ in the tuple. If some property usually has the same value for inequal objects, just leave it out. Don't make the hash computation any more expensive than it needs to be.

Edit: I would recommend against using xor to mix hashes in general. When two different properties have the same value, they will have the same hash, and with xor these will cancel eachother out. Tuples use a more complex calculation to mix hashes, see tuplehash in tupleobject.c.

Community
  • 1
  • 1
adw
  • 4,901
  • 1
  • 25
  • 18
  • 5
    As you said hash functions usually only make sense for immutable objects. Hence it is possible to calculate the hash-value once in `__init__`. – Björn Pollex Oct 23 '10 at 19:20
  • 6
    +1 for the `return 0` hash function - I've always thought that anything else is premature optimisation :-). (I'm only half kidding). – Scott Griffiths Oct 23 '10 at 20:38
  • 7
    @BjörnPollex Rather than doing it in `__init__`, you can just cache the value in `__hash__`. That way if `__hash__` is never called, you didn't waste either time or memory. I assume checking whether the value has been already cached isn't expensive is it? (Not sure if it's best through exception or explicit `if`). – max Apr 22 '12 at 05:59
  • 1
    It's unfortunate that Python does not make a `combine_hashes` function available. – Fred Foo Sep 20 '12 at 11:34
  • Your first statement seems incorrect. "It also shouldn't change over the lifetime of the object; generally you only implement it for immutable objects." It is implemented in mutable objects as well no? – Aaron Barclay Aug 13 '13 at 00:49
  • 3
    It's not implemented in things like dict or list, the justification being that changing the hash of an object that already belongs to, e.g., a set wreaks havoc on the set's internal data structures. – javawizard Sep 05 '13 at 22:03
  • 1
    @AaronBarclay in Python 2, all user-defined classes inherit the default `object.__hash__` by default, so they are always hashable, but frequently break the hash-equality invariant if they define a custom `__eq__`. So in Py2 if you define `__eq__`, you should really always either set `__hash__ = None` (if instances are mutable) or define `__hash__` (if immutable). In Py3 defining `__eq__` automatically sets `__hash__` to `None`, so just avoid defining `__hash__` unless instances are immutable. See https://asmeurer.github.io/blog/posts/what-happens-when-you-mess-with-hashing-in-python/ – Carl Meyer Jul 20 '16 at 17:46
  • @FredFoo: But they do. That's what `hash` called on a `tuple` does, combine the hashes of the values in the `tuple`. The cost of making a small `tuple` is trivial (Python optimizes `tuple`s specially since they're created implicitly in many different ways to support non-`tuple`-specific functionality), and calling `hash` on it combines the hashes of its component parts. Any reasonable implementation of a `combine_hashes` function would do the same thing, constructing a `tuple` of the arguments (built-ins receive their arguments as a `tuple` implicitly), then calling `hash` on it. – ShadowRanger Nov 04 '19 at 22:57
  • @FredFoo: `combine_hashes` would actually be more complex than that, because if you're going to expose such a function, you'd need to make an order-insensitive version as well (or allow passing an argument to make it an order-insensitive hash). Python's solution for that case is to just do `hash(frozenset({attributes, to, hash}))`, which gets the canonical order-insensitive hash. – ShadowRanger Nov 04 '19 at 22:59
  • Could someone elaborate on `return 0` being *always correct, but performing badly*? How can this always be correct? Shouldn't `hash` values be different for different objects? What about the performance part? Why does it *perform badly*? – Christopher Graf Mar 17 '21 at 12:59
  • 1
    @ChrisGraf Performance-wise it works great, but it performs badly because ideally different objects would hash to different values. It's simply not a useful hash function for any case where it would be important for different objects to have different hashes (most of them). It's a tongue-in-cheek example – Brandon Oct 25 '22 at 23:19
28

Documentation for object.__hash__(self)

The only required property is that objects which compare equal have the same hash value; it is advised to mix together the hash values of the components of the object that also play a part in comparison of objects by packing them into a tuple and hashing the tuple. Example

def __hash__(self):
    return hash((self.name, self.nick, self.color))
Raniz
  • 10,882
  • 1
  • 32
  • 64
S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • 5
    It will work, but it's bad that if you exchange `self.a` and `self.b` then you'll get the same hash while it will be the other "object". – eigenein Jun 20 '11 at 21:16
  • 1
    "somehow mix together (e.g. using exclusive or" is a pretty flexible set of requirements. If it actually matters, then `(hash(self.a)<<1) ^ hash(self.b)` might be better. There's no general answer, just a general guideline that has to be modified based on the specific application. – S.Lott Jun 20 '11 at 21:18
  • @eigenein if many cases, it's an advantage that the hash is unchanged when the order is changed. if you try to hash a `dict` or `set`, the hash that depends on the order of iteration is invalid. OTOH, the hash that simply causes extra collisions once in a while is, at worst, inefficient. – max Apr 16 '15 at 03:11
  • 26
    why not just hash a tuple value? hash((self.a, self.b)) – nightpool Feb 20 '16 at 20:38
  • 8
    Note that (fortunately) the suggestion to use xor is no longer present in the [Python 3](https://docs.python.org/3/reference/datamodel.html#object.__hash__) or [Python 2](https://docs.python.org/2/reference/datamodel.html#object.__hash__) docs. – PM 2Ring Jan 17 '17 at 12:05
  • 12
    For those whom are interested, here is the bug that led to removal of XOR recommendation: https://bugs.python.org/issue28383 – AXO Aug 21 '17 at 16:16
  • 1
    @axo we should update this answer with the hash of tuples – fersarr Oct 11 '17 at 09:17
  • 1
    @AXO, thanks for providing the link! Very interesting stuff. – Zaya Sep 28 '19 at 02:28
22

It's dangerous to write

def __eq__(self, other):
  return other and self.a == other.a and self.b == other.b

because if your rhs (i.e., other) object evaluates to boolean False, it will never compare as equal to anything!

In addition, you might want to double check if other belongs to the class or subclass of AClass. If it doesn't, you'll either get exception AttributeError or a false positive (if the other class happens to have the same-named attributes with matching values). So I would recommend to rewrite __eq__ as:

def __eq__(self, other):
  return isinstance(other, self.__class__) and self.a == other.a and self.b == other.b

If by any chance you want an unusually flexible comparison, which compares across unrelated classes as long as attributes match by name, you'd still want to at least avoid AttributeError and check that other doesn't have any additional attributes. How you do it depends on the situation (since there's no standard way to find all attributes of an object).

Anaphory
  • 6,045
  • 4
  • 37
  • 68
max
  • 49,282
  • 56
  • 208
  • 355
  • 8
    Useful info, but unrelated to the primary question about hashing. – Mad Physicist Jan 04 '18 at 17:19
  • Not related, but thank you for posting this never the less. +1. – Leo Ufimtsev May 31 '19 at 16:39
  • 1
    This is a bad `__eq__` implementation, as it does not delegate to the right hand side's `__eq__` if the left hand does not know how to do the comparison. If `int` had a `__eq__` like this, `1 == MyNumericType(1)` would always be `False`, even if `MyNumericType(1) == 1` would return `True`. If you don't recognize the type of `other`, always `return NotImplemented`, don't just `return False`. – ShadowRanger Nov 04 '19 at 23:03
  • @ShadowRanger Agreed. – max Nov 05 '19 at 08:31