1

Assume I have a custom class CustomObject and I do not define a custom __hash__ or __eq__ function for it. Will there be any difference between the following two operations in terms of outputs in any conditions?

a = CustomObject(1)
b = CustomObject(1)
setA = set()

# option 1
setA.add(a)
print((b in setA))

# option 2
setA.add(id(a))
print((id(b) in setA))

According to What is the default __hash__ in python?, the default __hash__ function is bound to the id of the object, so I assume there is no difference between the above two options?

If I define custom __hash__ functions for CustomObject like in add object into python's set collection and determine by object's attribute, the above two options will be different, right?

Barmar
  • 741,623
  • 53
  • 500
  • 612
Yi Zhao
  • 346
  • 2
  • 13
  • 1
    `id(a)` is `a`'s object ID. So you're not actually adding `a` to the set, but only its ID (which is an `int`). – inspectorG4dget Oct 04 '22 at 01:27
  • 2
    Saving IDs is practically never useful. There's no way to go from the ID back to the corresponding object, and IDs can be reused if the object is garbage collected. – Barmar Oct 04 '22 at 01:28
  • The hash function is used as part of the algorithm for adding to sets, but that's not the same as just saving the hash value in the set. – Barmar Oct 04 '22 at 01:32
  • ...but if all the OP wants to do is test for membership of particular classes or objects in particular groups/sets of classes, isn't their question a valid one? – CryptoFool Oct 04 '22 at 01:32
  • 1
    @CryptoFool No, because IDs can be reused and you'll get a false positive. – Barmar Oct 04 '22 at 01:33
  • @Barmar - ah...there you go! I think that's what the OP needs to understand then. Seems to me that the question is valid, and that's the answer to the question. – CryptoFool Oct 04 '22 at 01:34
  • @Barmar If I only add ids to the set, the actual object may be garbage collected. If I add the object to the set, the object will not be garbage collected as long as the set is in use (although still the ids are used to hash the objects inside the set). Right? – Yi Zhao Oct 05 '22 at 02:01

2 Answers2

2

Saving the ID can result in a false positive if any of the objects become garbage and the ID is reassigned.

a = CustomObject(1)
setA = set()
setA.add(id(a))
del a
b = CustomObject(1)
print(id(b) in setA)

This would print True if b gets the same ID that a previously had.

Barmar
  • 741,623
  • 53
  • 500
  • 612
2

The same reason as that mentioned by @Barmar, a phenomenon that is easier to reproduce is that only one address can be obtained by adding temporary CustomObject for many times:

>>> class CustomObject:
...     def __init__(self, value):
...         self.value = value
...
>>> {id(CustomObject(1)) for _ in range(10)}
{1799037490496}
>>> {id(CustomObject(i)) for i in range(10)}
{1799034371856}

In addition, you can only get the address instead of the object you added when iterating over the set. There are methods in the ctypes library that can get the object through the address, but when the object is destroyed, it is not safe to get it through the address.

Mechanic Pig
  • 6,756
  • 3
  • 10
  • 31