240

What must I do to use my objects of a custom type as keys in a Python dictionary (where I don't want the "object id" to act as the key) , e.g.

class MyThing:
    def __init__(self,name,location,length):
            self.name = name
            self.location = location
            self.length = length

I'd want to use MyThing's as keys that are considered the same if name and location are the same. From C#/Java I'm used to having to override and provide an equals and hashcode method, and promise not to mutate anything the hashcode depends on.

What must I do in Python to accomplish this ? Should I even ?

(In a simple case, like here, perhaps it'd be better to just place a (name,location) tuple as key - but consider I'd want the key to be an object)

aknuds1
  • 65,625
  • 67
  • 195
  • 317
Anonym
  • 7,345
  • 8
  • 35
  • 32
  • What's wrong with using the hash? – Rafe Kettler Feb 04 '11 at 18:55
  • 7
    Probably because he wants two `MyThing`, if they have the same `name` and `location`, to index the dictionary to return the same value, even if they were created separately as two different "objects". – Santa Feb 04 '11 at 19:00
  • 1
    "perhaps it'd be better to just place a (name,location) tuple as key - but consider I'd want the key to be an object)" You mean: a NON-COMPOSITE object ? – eyquem Feb 04 '11 at 21:20

6 Answers6

284

You need to add 2 methods, note __hash__ and __eq__:

class MyThing:
    def __init__(self,name,location,length):
        self.name = name
        self.location = location
        self.length = length

    def __hash__(self):
        return hash((self.name, self.location))

    def __eq__(self, other):
        return (self.name, self.location) == (other.name, other.location)

    def __ne__(self, other):
        # Not strictly necessary, but to avoid having both x==y and x!=y
        # True at the same time
        return not(self == other)

The Python dict documentation defines these requirements on key objects, i.e. they must be hashable.

Nate Anderson
  • 18,334
  • 18
  • 100
  • 135
6502
  • 112,025
  • 15
  • 165
  • 265
  • 19
    `hash(self.name)` looks nicer than `self.name.__hash__()`, and if you do and you can do `hash((x, y))` to avoid XORing yourself. – Rosh Oxymoron Feb 04 '11 at 19:02
  • 5
    As an additional note, I just discovered that calling `x.__hash__()` like that is also *wrong*, because it _can_ produce _incorrect_ results: http://pastebin.com/C9fSH7eF – Rosh Oxymoron Feb 04 '11 at 20:32
  • @Rosh Oxymoron: thank you for the comment. When writing I was using explicit `and` for `__eq__` but then I thought "why not using tuples?" because I often do that anyway (I think it's more readable). For some strange reason my eyes didn't go back to question about `__hash__` however. – 6502 Feb 04 '11 at 20:39
  • @Rosh Oxymoron: I'm not sure I understand 100% your second comment. Are an instance of `A` and one of `B` to compare equal or not? If yes then that `__hash__` implementation is breaching the contract (if two objects compare equal they should return the same value from `__hash__` - see http://docs.python.org/reference/datamodel.html#object.__hash__ ). If on the other side they're not going to compare equal then it's ok for an implementation to call `__hash__` directly... what am I missing? – 6502 Feb 04 '11 at 21:30
  • To be honest, this is the kind of thing that I don't like to hide in an object if I can help. Far more readable and maintainable to just use a tuple of name and location as the key. A dictionary is a kind of in memory database and it is natural to use tuples for compound keys. – Michael Dillon Feb 04 '11 at 22:14
  • @6502: It was merely an example of how `x.__hash__()` is different from `hash(x)` and how it might lead to incorrect behaviour in exotic cases. It only tests how `__hash__` behaves, so how `__eq__` works is irrelevant to the example (but all my `A` and `B` objects have the same hash, so the contract is respected). The problem is that `hash(a) == hash(b)` but `a.__hash__() != b.__hash__()`, so when you use the second in an expression you get incorrect result. Here's a more detailed example, which also defines a proper `__eq__`: http://pastebin.com/F0anqN73 – Rosh Oxymoron Feb 04 '11 at 22:20
  • @Rosh Oxymoron: I agree that `__hash__()` is different from `hash()`, but still IMO if a/b `__hash__()` are respecting the contract I think you can use `a.__hash__()^b.__hash__()` as a valid implementation. Your second example too is not respecting the rules for a valid `__hash__()` implementation: for example `b1==a1` but `b1.__hash__()!=a1.__hash__()`. Ok that the system function `hash` adds a transformation on the value obtained from `__hash__` (e.g. `hash(-1)==-2`), but that's irrelevant... combining `__hash__`s with xor is valid anyway (it will never make equal things hashing different). – 6502 Feb 04 '11 at 23:10
  • WARNING if you do any other comparisons on objects, `__ne__()` doesn't come for free. The default implementation is akin to `not is`. So if you don't define it, objects can be **both equal and not equal**. Recommend `def __ne__(self, other): return not self.__eq__(other)` – Bob Stein Mar 22 '16 at 15:57
  • @6502 This give me error "Vector is not frozen (mutable), call freeze first" – user877329 Apr 24 '16 at 07:10
  • 1
    @user877329: are you trying to use some blender data structure as keys? Apparently from some repos certain objects require you to "freeze" them first to avoid mutability (mutating a value-based object that has been used as a key in a python dictionary is not permitted) – 6502 Apr 24 '16 at 07:22
  • @6502 Yes I do. Adding `self.uv.freeze()`, and `self.normal.freeze()` in CTOR solves this problem. – user877329 Apr 24 '16 at 07:24
  • @BobStein-VisiBone How could this happen? Can you give an example? The doc says: `By default, __ne__() delegates to __eq__() and inverts the result unless it is NotImplemented.` – kawing-chiu Jun 08 '16 at 07:10
  • 1
    @kawing-chiu http://pythonfiddle.com/eq-method-needs-ne-method/ <-- this shows the "bug" in Python 2. **Python 3 does not have this problem**: the default `__ne__()` has been ["fixed"](http://pastebin.com/Uhz79Rhj). – Bob Stein Jun 08 '16 at 12:39
  • Suppose we only want to use **self.name** as the key. We should only change it in the __hash__ function right. But in **__eq__** we shouldn't remove it – Hani Gotc Jul 26 '17 at 13:32
  • you forgot the length what Happened to the LENGTH? – Hani Gotc Jul 26 '17 at 13:40
  • @HaniGotc: length of what? A dictionary key doesn't need to have a length (you can use integers as key for example). – 6502 Jul 26 '17 at 19:17
  • @HaniGotc: sorry re-reading now I understood what you mean... in the question however the OP is saying that two things should be equal if `name` and `location` are the same; it's not considering `length`. – 6502 Dec 14 '22 at 11:58
38

An alternative in Python 2.6 or above is to use collections.namedtuple() -- it saves you writing any special methods:

from collections import namedtuple
MyThingBase = namedtuple("MyThingBase", ["name", "location"])
class MyThing(MyThingBase):
    def __new__(cls, name, location, length):
        obj = MyThingBase.__new__(cls, name, location)
        obj.length = length
        return obj

a = MyThing("a", "here", 10)
b = MyThing("a", "here", 20)
c = MyThing("c", "there", 10)
a == b
# True
hash(a) == hash(b)
# True
a == c
# False
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
22

You override __hash__ if you want special hash-semantics, and __cmp__ or __eq__ in order to make your class usable as a key. Objects who compare equal need to have the same hash value.

Python expects __hash__ to return an integer, returning Banana() is not recommended :)

User defined classes have __hash__ by default that calls id(self), as you noted.

There is some extra tips from the documentation.:

Classes which inherit a __hash__() method from a parent class but change the meaning of __cmp__() or __eq__() such that the hash value returned is no longer appropriate (e.g. by switching to a value-based concept of equality instead of the default identity based equality) can explicitly flag themselves as being unhashable by setting __hash__ = None in the class definition. Doing so means that not only will instances of the class raise an appropriate TypeError when a program attempts to retrieve their hash value, but they will also be correctly identified as unhashable when checking isinstance(obj, collections.Hashable) (unlike classes which define their own __hash__() to explicitly raise TypeError).

Skurmedel
  • 21,515
  • 5
  • 53
  • 66
  • 2
    The hash alone is not enough, additionally you either need to override `__eq__` or `__cmp__`. – Oben Sonne Feb 04 '11 at 18:59
  • @Oben Sonne: `__cmp__` is given to you by Python if it is a user defined class, but you probably want to override them anyway to accommodate for new semantics. – Skurmedel Feb 04 '11 at 19:00
  • 1
    @Skurmedel: Yes, but although you can call `cmp` and use `=` on user classes which do not override these methods, one of them must be implemented to meet the questioner's requirement that instances with similar name and location have the same dictionary key. – Oben Sonne Feb 04 '11 at 19:08
12

I noticed in python 3.8.8 (maybe ever earlier) you don't need anymore explicitly declare __eq__() and __hash__() to have to opportunity to use your own class as a key in dict.

class Apple:
    def __init__(self, weight):
        self.weight = weight
        
    def __repr__(self):
        return f'Apple({self.weight})'

apple_a = Apple(1)
apple_b = Apple(1)
apple_c = Apple(2)

apple_dictionary = {apple_a : 3, apple_b : 4, apple_c : 5}

print(apple_dictionary[apple_a])  # 3
print(apple_dictionary)  # {Apple(1): 3, Apple(1): 4, Apple(2): 5}

I assume from some time Python manages it on its own, however I can be wrong.

iamtodor
  • 734
  • 8
  • 21
1

The answer for today as I know other people may end up here like me, is to use dataclasses in python >3.7. It has both hash and eq functions.

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 24 '21 at 10:54
0

@dataclass(frozen=True) example (Python 3.7)

@dataclass had been previously mentioned at: https://stackoverflow.com/a/69313714/895245 but here's an example.

This awesome new feature, among other good things, automatically defines a __hash__ and __eq__ method for you, making it just work as usually expected in dicts and sets:

dataclass_cheat.py

from dataclasses import dataclass, FrozenInstanceError

@dataclass(frozen=True)
class MyClass1:
    n: int
    s: str

@dataclass(frozen=True)
class MyClass2:
    n: int
    my_class_1: MyClass1

d = {}
d[MyClass1(n=1, s='a')] = 1
d[MyClass1(n=2, s='a')] = 2
d[MyClass1(n=2, s='b')] = 3
d[MyClass2(n=1, my_class_1=MyClass1(n=1, s='a'))] = 4
d[MyClass2(n=2, my_class_1=MyClass1(n=1, s='a'))] = 5
d[MyClass2(n=2, my_class_1=MyClass1(n=2, s='a'))] = 6

assert d[MyClass1(n=1, s='a')] == 1
assert d[MyClass1(n=2, s='a')] == 2
assert d[MyClass1(n=2, s='b')] == 3
assert d[MyClass2(n=1, my_class_1=MyClass1(n=1, s='a'))] == 4
assert d[MyClass2(n=2, my_class_1=MyClass1(n=1, s='a'))] == 5
assert d[MyClass2(n=2, my_class_1=MyClass1(n=2, s='a'))] == 6

# Due to `frozen=True`
o = MyClass1(n=1, s='a')
try:
    o.n = 2
except FrozenInstanceError as e:
    pass
else:
    raise 'error'

As we can see in this example, the hashes are being calculated based on the contents of the objects, and not simply on the addresses of instances. This is why something like:

d = {}
d[MyClass1(n=1, s='a')] = 1
assert d[MyClass1(n=1, s='a')] == 1

works even though the second MyClass1(n=1, s='a') is a completely different instance from the first with a different address.

frozen=True is mandatory, otherwise the class is not hashable, otherwise it would make it possible for users to inadvertently make containers inconsistent by modifying objects after they are used as keys. Further documentation: https://docs.python.org/3/library/dataclasses.html

Tested on Python 3.10.7, Ubuntu 22.10.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985