103

I need to create a 'container' object or class in Python, which keeps a record of other objects which I also define. One requirement of this container is that if two objects are deemed to be identical, one (either one) is removed. My first thought was to use a set([]) as the containing object, to complete this requirement.

However, the set does not remove one of the two identical object instances. What must I define to create one?

Here is the Python code.

class Item(object):
  def __init__(self, foo, bar):
    self.foo = foo
    self.bar = bar
  def __repr__(self):
    return "Item(%s, %s)" % (self.foo, self.bar)
  def __eq__(self, other):
    if isinstance(other, Item):
      return ((self.foo == other.foo) and (self.bar == other.bar))
    else:
      return False
  def __ne__(self, other):
    return (not self.__eq__(other))

Interpreter

>>> set([Item(1,2), Item(1,2)])
set([Item(1, 2), Item(1, 2)])

It is clear that __eq__(), which is called by x == y, is not the method called by the set. What is called? What other method must I define?

Note: The Items must remain mutable, and can change, so I cannot provide a __hash__() method. If this is the only way of doing it, then I will rewrite for use of immutable Items.

Ada
  • 1,746
  • 4
  • 15
  • 15
  • 1
    Had this same problem. I assume you are manipulating small amounts of data inside your code. This is probably not a good candidate for the use of a database. I remember being able to create a set and define a comparator function in C++ and I believe Java as well, however it doesn't look like you can do this with dictionary objects in Python. Seems like someone may have written a "set" library in Python that can do this, but I'm not aware of one. – Chris Dutrow May 13 '13 at 22:12

2 Answers2

81

Yes, you need a __hash__()-method AND the comparing-operator which you already provided.

class Item(object):
    def __init__(self, foo, bar):
        self.foo = foo
        self.bar = bar
    def __repr__(self):
        return "Item(%s, %s)" % (self.foo, self.bar)
    def __eq__(self, other):
        if isinstance(other, Item):
            return ((self.foo == other.foo) and (self.bar == other.bar))
        else:
            return False
    def __ne__(self, other):
        return (not self.__eq__(other))
    def __hash__(self):
        return hash(self.__repr__())
ruena
  • 811
  • 1
  • 6
  • 2
  • 5
    On Python 3, you don't need `__ne__`, and even in 2.x, you shouldn't define `__ne__` in terms of `__eq__`. See, https://stackoverflow.com/a/30676267/5337834 – John Strood Aug 10 '18 at 10:27
40

I am afraid you will have to provide a __hash__() method. But you can code it the way, that it does not depend on the mutable attributes of your Item.

eumiro
  • 207,213
  • 34
  • 299
  • 261
  • 3
    In the second paragraph here, it points out that `__hash__()` should only be defined for immutable objects. – Ada Oct 15 '10 at 13:06
  • 1
    @Nathanael: if the object may have to change, you can make an immutable copy of the object, like frozenset() and set(). – Lie Ryan Oct 15 '10 at 13:18
  • 2
    @Nathanael - how would you like to call `__eq__`? Comparing those (1,2) attributes? Then you have to return some hash of (1,2) also in your `__hash__` method. – eumiro Oct 15 '10 at 13:24
  • Or, since foo and bar are both immutable, the __hash__() could return the sum of the hashes of foo and bar? No... sum is too unrelible... if foo was 1 and bar 2, that would equal bar as 1 with foo as 2, which is wrong. What mathematical function can be used for this purpose? Modulo or division should work. – Ada Oct 15 '10 at 13:26
  • 2
    Nathanael: Just use the `hash` builtin: `hash((self.foo, self.bar))`. This uses the tuple's hash, which is appropriate for your needs. (Your `__eq__` could also be written in terms of `tuple` comparison.) – Pi Delport Oct 15 '10 at 13:30
  • @Nathanael: Piet's proposal with `hash((self.foo, self.bar))` is perfect for your needs. – eumiro Oct 15 '10 at 13:33
  • @Nathanael: Hash collision is Ok. python's set will double check with `__eq__` to ensure that objects with the same hashes are really equivalent; however you'd want the least amount collision possible for the best performance of set(). – Lie Ryan Oct 15 '10 at 13:35
  • If you ever want to do something like this `i = item(1,1); i.foo = 2` then you *can't use a set*! The set won't get notified of any changes to it's contents and that is why `__hash__` must be static in the first place! If you won't change your items, then you should make your Items really immutable. – Jochen Ritzel Oct 15 '10 at 13:40
  • @THC4k Thanks for pointing that out. I had decided to treat them as immutable, because the hashing function is based on the hashes of the attributes, anyway. Now, when I modify them, I use symmetric_difference_update(old_item, new_item), where new item is based on old_item. – Ada Oct 16 '10 at 17:57