1

How can I retain Uniqueness feature of Set for modifying attributes of user-defined instance after adding them into the Set?

like in the code below: both Person "Jack" and "John" are different in term of equality "Name" . So they both are added into the set but if I change Person "Jack" name to "John, then the 2 instance jack and john will be equal however my Set doesn't reflect that. They still consider those 2 instances are different

Note: this leads to potential bug when someone accidentally modifies the user-defined instances after they have been added into the set

Do we have a way to refresh the Set or how i can avoid this issue?

class Person: 
    def __init__(self, name): 
        self.name = name 
    def __eq__(self, other):
        return self.name == other.name
    def __hash__(self):
        return hash(self.name)
jack = Person("Jack")
john = Person("John")
set1 = {jack, john}
jack.name = "John"
print(set1) // return 2 instance instead of 1. This is undesired behavior because now both jack & john are equal 
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Jackie
  • 23
  • 2
  • 2
    Members in a Python set are expected to be immutable. Hashes are only compared on insert - mutation of objects in a set does not trigger a re-evaluation of the uniqueness property of all members in a set. To understand that, try to imagine a large set. If the Python runtime was to allow dynamic uniqueness checks, it would be quite busy looping through the set all the time whenever something is updated. See also [this answer](https://stackoverflow.com/questions/31340756/python-why-can-i-put-mutable-object-in-a-dict-or-set) – Felix May 14 '21 at 09:10

2 Answers2

2

You should only use sets of immutable objects or references. See Python docs:

Having a __hash__() implies that instances of the class are immutable.

The Person objects in your set are mutable but you have implemented your own hash and equality functions that get around this, bypassing safety, as you have pointed out.

I think it's fine to define custom hash and equality functions but they should always return the same thing no matter what you do to the things they reference: e.g., using an ID or memory address to hash.

I suggest one of two options, with a strong preference on the first:

Option A: Immutable Person

Make Person immutable when constructed. My favourite way of doing this is with a dataclass:

from dataclasses import dataclass

@dataclass(frozen=True)
class Person:
    name: str

jack = Person("Jack")
john = Person("John")

# Note you don't need to define your own hash method.

set1 = {jack, john}

# This will fail:

jack.name = "Jaques"

# Consider the need for this. But if you have, say, a lot of different
# fields on the Person and want to just change one or a few, try:

import dataclasses

jaques = dataclasses.replace(jack, {"name": "Jaques"})

# But note this is a different object. The set is still the same as before.
# You need to remove "jack" from the set and add "jaques" to it.

Option B: Recalculate the Set

I should note that I don't think this is a good idea, but you could simply run:

set1 = {jack, john}

...again, and it will recalculate the set.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Thomas
  • 701
  • 2
  • 8
  • 23
  • Thanks for pointing out. It seems that i have violated the definition of "Hashable" object which says " its hash value won't change during its life cycle". Because my hash is depended on Name attribute which can be changed, the hash value will be changed during its like cycle. In order to fix the problem from your solution hint, any attributes are used for hashing , they must be read-only in order to avoid potential bug like my case. By the way, "Hashable object doesn't imply Immutable" but " Immutable implies Hashable". Now my head is clear about his problem ^_^ – Jackie May 14 '21 at 19:12
0

You created two different object and if you print set1 you'll get something like:{<__main__.Person object at 0x7f8dfbfc5e10>, <__main__.Person object at 0x7f8dfbfe2a10>}

Even though their attribute names are different, they're still two different objects saved in different memory spaces. That's why you have the unexpected behavior of still having both of them when you put them into a set!

When you do jack.name = "John" you're only changing the attribute self.name.

In order to get the outcome you wanted you have to do: set1 = {jack.name, john.name}

It'll return you {'John'}

Dharman
  • 30,962
  • 25
  • 85
  • 135
  • thanks for replying. Maybe my question is confusing. My question is how to avoid of Unexpected of Un-Uniqueness feature of set if we use user-defined objects as elements inside the set and then we accidentally modify attribute of those user-defined. I still want to keep the set of user-defined objects instead of set of Name attribute – Jackie May 14 '21 at 19:19