2

I have a helper class to help with string methods. It has a bunch of methods and variables but I want the underlying hash to based on the contents of its 'main' string. So the class looks something similar to this:

class Topic:

    def __init__(self, name):
        self.name = name

    def getName(self):
        return self.name

    def setName(self, newName):
        self.name = newName

    def __str__(self):
        return self.name

however I want a dictionary to hash this object as a string so when I do the following code:

a = Topic('test')
v = {a : 'oh hey'}

print(v[Topic('test')])

I want it to print 'oh hey' instead of throwing a key error. I tried doing this to my Topic class:

def __hash__(self):
    return hash(self.name)

but it didn't work and I can't find online how Python hashes their strings. Is there anyway to make this work the way I intend? Thanks for any information.

sundance
  • 2,905
  • 4
  • 21
  • 31
Pookie
  • 1,239
  • 1
  • 14
  • 44
  • 2
    I would expect `hash(self.name)` to work just fine, what's the error? Note you also need to implement `__eq__` to put something in a dictionary. – jonrsharpe Jul 27 '18 at 16:02
  • No, there are plenty of examples already. See e.g. https://stackoverflow.com/q/7560172/3001761, https://stackoverflow.com/q/4901815/3001761. – jonrsharpe Jul 27 '18 at 16:04
  • 4
    This is probably a bad idea. If you use a Topic as a key, then call `setName` on it, you’ve broken the dictionary. It was bashed under the old name, but it no longer has that hash value. This is why, of the builtin containers, only the immutable ones are hashable. The fact that you have an explicit setter for name (getters and setters are almost never needed, or wanted, in Python) just calls out that you want people to modify these things and break your dictionaries. – abarnert Jul 27 '18 at 16:20

2 Answers2

4

If you read the documentation on __hash__, it explains what's going on, and how to fix it:

If a class does not define an __eq__() method it should not define a __hash__() operation either…

If two values hash the same, but aren't equal, they're not the same key as far as a dict is concerned, they're two different values that happened to have a hash collision. So, your Topic values are still keyed by identity (you can only look up a Topic with the exact same instance, not another instance with the same name), you're just making it less efficient.

To fix that, you want to add an __eq__ method that makes two Topics equal if they have the same name.

def __eq__(self, other):
    return self.name == other.name

But there are two problems with this.


First, your Topic objects will now hash the same as their names—but they won't be equal to them. That probably isn't what you want.

If you want to be able to look up a topic by just using the string as a key, you need to change the __eq__ method to handle that:

def __eq__(self, other):
    return self.name == other or self.name == other.name

Or, if you want two Topics with the same name to work like the same key, but not the name itself, you need to change __hash__ to something like this:

def __hash__(self):
    return hash((type(self), self.name))

So, two Topic values with the name 'spam' will both get hashed as (Topic, "spam"), and will match each other, but won't match the hash of "spam" itself.


The second problem is more serious.

Your Topic objects are mutable. In fact, by using getters and setters (which you usually don't want in Python), you're explicitly calling out that you want people to be able to mutate the name of a Topic.

But if you do that, the same Topic no longer has the same hash value, and no longer equals its original value. This will break any dictionary you'd put it in.

>>> v = {a: 'oh hey'}
>>> a.setName('test2')
>>> v
KeyError: <__main__.Topic object at 0x12370b0b8>

This is covered in the same docs:

If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

This is why the only builtin collections that are hashable are the immutable ones.

Occasionally, this is worth subverting. If you have an type that's mutable in general, but you know you're never going to mutate one of them after it's stored or looked up in a dict, you can, basically, lie to Python and tell it your type is immutable and therefore suitable as a dict key by defining a __hash__ and __eq__ that would break if you mutated the object, but isn't going to break because you're never going to do that.

But usually, you want to follow the rule that if you want something to be a key, it should be immutable.

Usually it's sufficient to just make it "immutable by convention". For example, if you make name "private by convention" by renaming it to _name, and get rid of the setName method and have only getName, your existing class (with the added __hash__ and __eq__ methods) is fine. Sure, someone could break your dicts by changing the private attribute's value out from under you, but you can expect your users to be "consenting adults" and not do that unless they have a good reason.


One last thing, while we're at it: You almost always want to define a __repr__ for a class like this. Notice the error we got above complained about <__main__.Topic object at 0x12370b0b8>? Likewise, if you just evaluate a at the interactive prompt, or print(v), even without any problems, the Topic is going to show up like this. That's because __str__ only affects str, not repr. The usual pattern is:

def __repr__(self):
    return f"{type(self).__name__}({self.name!r})"

Now, you'll see something like Topic("spam") instead of <__main__.Topic object at 0x12370b0b8>.


You may want to take a look at @dataclass, namedtuple, or a third-party library like attrs that can automatically write all of these methods—__init__, __hash__, __eq__, __repr__, and others—for you, and ensure that they all work together properly.

For example, this could replace your entire class definition:

@dataclass(frozen=True)
class Topic:
    name: str

Because it's frozen, it will use a tuple of its attributes—which is just name—for hashing and comparisons.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 1
    I am hoping I am not too dumb and that this will help a few people. I searched for a little while and couldn't find anything near what I was asking even though I know there are answers out there. This is hugely appreciated and gave me a ton more insight on the situation than I was even hoping to get. Thank you! – Pookie Jul 27 '18 at 16:49
  • 1
    @Luke Yeah, the information is buried in the reference docs, which not many people read for fun. :) But it really is worth reading the docs on the special methods you want to override, because there are a lot of non-obvious things in there. – abarnert Jul 27 '18 at 16:56
2

In order to make something in Python custom-made hashable we need to not just give it a custom hash function but also make it able to be compared to another version of its same type so the updated code(that works) is as follows:

class Topic:

    def __init__(self, name):
        self.name = name;

    def getName(self):
        return self.name

    def setName(self, newName):
        self.name = newName

    def __str__(self):
        return self.name;

    def __eq__(self, other):
        return self.name == other.name

    def __hash__(self):
        return hash(self.name)

EDIT:

@abarnert pointed out something very wrong with this approach. See the comments below(or his very thorough answer) to understand why you SHOULD NOT do this. It will work but it is deceivingly dangerous and should be avoided.

Pookie
  • 1,239
  • 1
  • 14
  • 44
  • 1
    This code only works as long as you never call `setName`. Try inserting some of these into your dict as keys, then calling `setName` on them, then trying to look them up, iterate the dict, add a new key that has the name the old one used to have, etc. – abarnert Jul 27 '18 at 16:27
  • 2
    Also: your Topic will hash identically to a string that matches its `name`, but it won’t be equal to that string, only to another Topic with the same name. Is that really want you want? Usually for cases like this you want two Topics with the same name to hash the same as each other, but you don’t want them to hash the same as the name itself. You can do that by hashing something else, like `type(self)`, together with `self.name` (just hash a tuple of two things to combine the hashes). – abarnert Jul 27 '18 at 16:32
  • 1
    @abarnert If u want to add all this information to an answer so I can accept it I would gladly do so. This is super eye opening for me as someone coming from Java. I can add your comments to my existing answer but I rather credit to you. – Pookie Jul 27 '18 at 16:34