1

I'm trying to build a set of instances of an object, however adding instances of certain objects results in a TypeError: unhashable instance. Here is a minimal example:

from sets import Set
import random
from UserDict import DictMixin

class Item1(object):
    pass

class Item2(DictMixin):
    pass

item_collection = Set()

x = Item1()
y = Item2()

item_collection.add(x) # this works
print item_collection
item_collection.add(y) # this does not
print item_collection

Why does that fail and how can I get a set of instances of an object derived from DictMixin?

pysnake
  • 117
  • 3
  • 7

2 Answers2

5

Your classes can, if you want, define __hash__ and comparison methods (most importantly __eq__) to be consistent with each other and "stable" -- i.e., the equality of two objects must not vary over the objects' lifetimes, and of course neither must each object's hash value vary over the object's lifetime.

The consistency requirement is: a==b must imply hash(a)==hash(b) (the reverse need not hold, and indeed rarely does).

So if you're OK with those requirements the simplest implementation would be:

class Item2(DictMixin):
    def __hash__(self): return hash(id(self))
    def __eq__(self, x): return x is self
    def __ne__(self, x): return x is not self

as it happens this would also automatically interoperate with your Item1 class, because this is the default implementation of hashing and comparison for classes that don't inherit or define other versions (as you're inheriting a different version of __eq__ from DictMixin unless you override it again).

x is self is a faster, more direct, and more concise expression of id(x) == id(self), because that is the meaning of the is operator -- identity of id (i.e., same object).

So is the fact that a==b is forced thereby to mean the same thing as a is b a problem for your application? If so, then sets are just not usable for said application and you need to think about some other, completely different data structure (one that's not based on hashing, because without the __eq__ override you can't make hashing work correctly).

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • Are there any downsides to using `id` to generate the hash? I asked about it here http://stackoverflow.com/questions/2040101/using-object-id-as-a-hash-for-objects-in-python since it's something new that occurred to me. – Noufal Ibrahim Jan 11 '10 at 05:38
4

In order to put things into a set, they should be hashable. Tuples for example are hashable whereas lists are not. You can make your object hashable by giving it a __hash__ method that will generate a hash key (a unique identifier for that instance of the class dependent on the data it's holding) for it.

Here is an example to trying to add a list into a set.

>>> x = [1,2,3]
>>> a=set()
>>> a.add(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

It looks like your DictMixin class is not hashable.

Noufal Ibrahim
  • 71,383
  • 13
  • 135
  • 169
  • How would I implement that? I don't want the hash to be dependent on the data it is holding, rather I want each entry of the set to be a reference to a particular instance regardless of the data it is holding as it might change. Something like a pointer to an instance in C terms, – pysnake Jan 10 '10 at 18:37
  • The good lookup characteristics of the set are obtained by hashing the objects which it holds. This necessitates that the objects be hashable since it's next to impossible to lookup objects unless you have a way of converting them into a hash. This is the reason by mutable types like lists cannot be put into a set. You could try using the `id` of the created object as the hash value. I've never tried that and I'm not sure if it's a good idea. Are you sure you want a set? Would a list not do? – Noufal Ibrahim Jan 10 '10 at 18:41
  • I need the elements of the collection to be unique and need to test for membership, so a list does not seem right. Using the id sounds intuitively right, but I don't want to do something like that without understanding all the consequences. Any other suggestions for a more suitable datatype? Basically I need a collection of pointers to instances of an object with unique membership and quick lookup. – pysnake Jan 10 '10 at 18:51