1

I have a simple custom object that represent custom tag, that user can attach to another object.

  • I want to store tags in a set, because I want to avoid duplicates and because order doesn't matter.
  • Each tag contain values "name" and "description". Later on, I might add another variables, but the key identifier for tag is "name".
  • I want to check whether tag is equal to other either by tag.name == other.name or against string tag == 'whatever'.
  • I want users to be able to edit tags including renaming them.

I have defined the object like this and everything worked as expected:

class Tag:

    def __init__(self, name, description=""):
        self.name = name
        self.description = description

    def __str__(self):
        return self.name

    def __repr__(self):
        return self.name

    def __eq__(self, other):
        if isinstance(other, Tag):
            return self.name == other.name
        else:
            return self.name == other

    def __hash__(self):
        return hash(self.name)

The problem appeared, when I tried to change the tag name:

blue_tag = Tag("blue") 
tags = {blue_tag}
blue_tag in tags  # returns True as expected
"blue" in tags  # returns True as expected
blue_tag.name = "navy"
"navy" in tags # returns False. Why?

I don't understand why. The tag is correctly renamed, when I do print(tags). The id of bluetag object is also the same, hash of the name is also the same.

Everywhere, including Python official documentation, I found just basic info that in checks whether item is present in container and to define custom container, I need to define custom methods like __contains__ But I don't want to create custom set method.

The closest thing I found I found was a question here on SO:

Custom class object and "in" set operator

But it still didn't solve the problem.

ripfruit
  • 13
  • 3
  • 1
    I get `blue_tag in tags` is `False` also, after the rename. If you change the attribute that gives it its identity, Python _can_ (but doesn't _necessarily_) end up looking in the wrong "bucket" for it. – jonrsharpe Mar 25 '22 at 12:11

1 Answers1

0

The problem is that in changing a tag name attribute, you change its hash in the class above: and the hash of an object must not change after it is added to a set or as dictionary as a key.

The thing is that if two objects are "equal" they must have the same hash value - since you want your tags to be comparable by name, this implies that they can't have their name changed at all: if an object compares equal to another, their hash values must also be the equal: i.e. you can't simply add another immutable attribute to your class and base your hash value on that instead of the name.

The workaround I see in this case is to have a special "add_to_set" method on your Tag class; it would then track the sets it belongs to, and turn name into a property instance, so that whenever name is changed, it removes and re-adds the Tag itself from all sets it belongs to. The newly re-inserted tag would behave accordingly.

Making this work properly in parallel code would take somewhatmore work: as one could make use of the sets in another thread during the renaming - but if that is not a problem, then what is needed is:

class Tag:

    def __init__(self, name, description=""):
        self.sets = []
        self.name = name
        self.description = description

    ...  # other methods as in your code 

    def __hash__(self):
        return hash(self.name)

    def add_to_set(self, set_):
        self.sets.append(set_)
        set_.add(self)

    def remove_from_set(self, set_):
        self.sets.remove(set_)
        set_.remove(self)

    @property
    def name(self):
        return self._name

    @name.setter
    def name(self, value):
        # WARNING: this is as thread unsafe as it gets! Do not use this class
        # in multi-threaded code. (async is ok)
        
        try:
            for set_ in self.sets:
                set_.remove(self)
            self._name = value
        finally:
            for set_ in self.sets:
                set_.add(self)

And now:

In [17]: a = Tag("blue")

In [18]: b = set()

In [19]: a.add_to_set(b)

In [20]: a in b
Out[20]: True

In [21]: b
Out[21]: {blue}

In [22]: a.name = "mauve"

In [23]: b
Out[23]: {mauve}

In [24]: a in b
Out[24]: True

It is possible to specialize a set class that would automatically call the add_to_set and remove_from_set methods for you as well, but this is likely enough.

jsbueno
  • 99,910
  • 10
  • 151
  • 209