Python 2: different meaning of the 'in' keyword for sets and lists

Question

Consider this snippet:

class SomeClass(object):

    def __init__(self, someattribute="somevalue"):
        self.someattribute = someattribute

    def __eq__(self, other):
        return self.someattribute == other.someattribute

    def __ne__(self, other):
        return not self.__eq__(other)

list_of_objects = [SomeClass()]
print(SomeClass() in list_of_objects)

set_of_objects = set([SomeClass()])
print(SomeClass() in set_of_objects)

which evaluates to:

True
False

Can anyone explain why the 'in' keyword has a different meaning for sets and lists? I would have expected both to return True, especially when the type being tested has equality methods defined.

See http://stackoverflow.com/questions/7549709/unexpected-behavior-for-python-set-contains .. incidentally, in Python 3 running this code hints at what's going on: "TypeError: unhashable type: 'SomeClass'" — DSM, Feb 13 '12 at 04:08
BTW, you do know that your `someattribute` here is a **class** attribute and not an **instance** attribute, right? You *have* heard of `__init__`, right? — Karl Knechtel, Feb 13 '12 at 07:01

score 17 · Accepted Answer · answered Feb 13 '12 at 04:07

17

The meaning is the same, but the implementation is different. Lists simply examine each object, checking for equality, so it works for your class. Sets first hash the objects, and if they don't implement hash properly, the set appears not to work.

Your class defines __eq__, but doesn't define __hash__, and so won't work properly for sets or as keys of dictionaries. The rule for __eq__ and __hash__ is that two objects that __eq__ as True must also have equal hashes. By default, objects hash based on their memory address. So your two objects that are equal by your definition don't provide the same hash, so they break the rule about __eq__ and __hash__.

If you provide a __hash__ implementation, it will work fine. For your sample code, it could be:

def __hash__(self):
    return hash(self.someattribute)

answered Feb 13 '12 at 04:07

Ned Batchelder

364,293
75
561
662

4

This is one of the things that Python 3 handles more clearly: it will refuse to make a set out of any object that doesn't have a `__hash__()`. Python 2 has a default `__hash__()` that reflects object identity rather than equality. – lvc Feb 13 '12 at 04:11
Actually, what happened is that classic classes behaved the same way (not defining a `__hash__` method would give you a default that raised a TypeError if you did define `__cmp__` and/or `__eq__`,) but then new-style classes were introduced (in Python 2.2) and that behaviour wasn't correctly copied over. That oversight was missed for enough releases that changing it would potentially break too much code, so fixing it was delayed until Python 3. – Thomas Wouters Feb 13 '12 at 20:13

score 3 · Answer 2 · answered Feb 13 '12 at 04:07

In pretty much any hashtable implementation, including Python's, if you override the equality method you must override the hashing method (in Python, this is __hash__). The in operator for lists just checks equality with every element of the list, which the in operator for sets first hashes the object you are looking for, checks for an object in that slot of the hashtable, and then checks for equality if there is anything in the slot. So, if you override __eq__ without overriding __hash__, you cannot be guaranteed that the in operator for sets will check in the right slot.

score 1 · Answer 3 · answered Feb 13 '12 at 04:07

1

Define __hash__() method that corresponds to your __eq__() method. Example.

answered Feb 13 '12 at 04:07

jfs

399,953
195
994
1,670

Python 2: different meaning of the 'in' keyword for sets and lists

3 Answers3

Linked