4

There are several standard ways to make a class hashable, for example (borrowing from SO):

# assume X has 2 attributes: attr_a and attr_b
class X:
  def __key(self):
    return (self.attr_a, self.attr_b)

  def __eq__(x, y):
    return isinstance(y, x.__class__) and x.__key() == y.__key()

  def __hash__(self):
    return hash(self.__key())

Now suppose I have many classes that I want to make hashable. They are all immutable, with immutable attributes, and hashing all these attributes in bulk is acceptable (for a class with too many attributes, we would only want to hash a few attributes that are enough to avoid most collisions). Can I avoid writing __key() method by hand for every class?

Would it be a good idea to make a base class that defines __key(), __eq__, and __hash__ for them? In particular, I'm not sure whether finding all the instance attributes that should go into __hash__ is doable. I know this is generally impossible, but in this case we can assume more about the object (e.g., it's immutable - after __init__ is finished, its attributes are all hashable, etc.).

(If the inheritance hierarchy won't work, perhaps a decorator would?)

Community
  • 1
  • 1
max
  • 49,282
  • 56
  • 208
  • 355

2 Answers2

4

Instances store their attributes in self.__dict__:

>>> class Foo(object):
...     def __init__(self, foo='bar', spam='eggs'):
...         self.foo = foo
...         self.spam = spam
... 
>>> f = Foo()
>>> f.__dict__
{'foo': 'bar', 'spam': 'eggs'}

Provided you don't store any methods on your instances, a default .__key() could be:

def __key(self):
    return tuple(v for k, v in sorted(self.__dict__.items()))

where we sort the items by attribute name; the tuple() call ensures we return an immutable sequence suitable for the hash() call.

For more complex setups you'd have to either test for the types returned by values() (skip functions and such) or use a specific pattern of attributes or repurpose __slots__ to list the appropriate attributes you can use.

Together with your __hash__ and __eq__ methods, that'd make a good base class to inherit from for all your immutable classes.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Can `__key` just return `self.__dict__.values()`, without converting to `tuple`? The view object seems to be hashable. – max Sep 20 '12 at 12:41
  • Don't I need ensure that the order of the attributes in `self.__dict__.values()` is consistent? Since my `__key` is used not only for hashing, but also for `__eq__`, I can't afford the risk of `__key()` returning the same value for two unequal instances. And I don't think there's any guarantee that a dictionary iterates in the same order if it's created in the same order. (And in fact, it may not be created in the same order... `__init__` might follow a different branch for some instances, causing a different attribute assignment order.) – max Sep 20 '12 at 12:44
  • I've added sorting, to ensure stable ordering. – Martijn Pieters Sep 20 '12 at 12:48
  • @max -- Once you create a dictionary, it iterates in the same order (for that session) as long as you don't add more keys/values. Now the order that `pypy` uses might be different than `Cpython`, or it might be different using python2.6 vs python2.7 -- but that shouldn't really matter ... – mgilson Sep 20 '12 at 12:53
  • But when I test `x.__key() == y.__key()` (for equality), I will be comparing two *different* `__dict__` objects, not the same one, since each instance has its own `__dict__`. I'm curious, does this guarantee extend to different dictionaries in the same session, where insertion order is the same? In any case, I can't guarantee that insertion order is the same here. – max Sep 20 '12 at 12:55
  • @mgilson: It also depends on the *order* you added the keys. If two keys collide, they switch order in the dict too based on the insertion order. – Martijn Pieters Sep 20 '12 at 12:55
  • @MartijnPieters -- Yes, but that's why I said "Once you create a dictionary ... as long as you don't add more keys/values". I mean that if you have a `dict` and iterate over it now and then iterate over it later in the code, it'll yield elements in the same order as long as you don't add to it in between. – mgilson Sep 20 '12 at 12:58
  • @max -- If you can't guarantee insertion order, then you need to `sorted` them. If you can guarantee that insertion order is the same for both dictionaries, then I think you can safely bypass the sorting ... Although I suppose that could be Cpython specific (I'm not positive that the documentation guarantees the order to be consistent across *different* dictionaries with the same keys and insertion order) – mgilson Sep 20 '12 at 13:01
  • @mgilson: Yes, but how can the OP be certain the attributes are always set in the same order for each instance? It's too easy to create bugs relying on the ordering of dicts, better to sort. – Martijn Pieters Sep 20 '12 at 13:01
  • @MartijnPieters -- I suppose it very much depends on the classes. If the object is immutable, then supposedly everything is set up after `__init__` isn't it? If the attributes are set in a consistent order in `__init__`, wouldn't the insertion order in `__dict__` be the same? -- However, you're right, it is *safer* to sort. But, that also is an `O(N logN)` operation every time you want the hash which could kill your efficiency using this as a dictionary key for instance ... (couldn't it?) – mgilson Sep 20 '12 at 13:03
  • @mgilson: don't underestimate the power of `if:`/`elif:` etc. logic in `__init__`. Attribute setting order can *easily* alter. Unless the number of attributes is exorbitantly large, I'd not worry about the performance of the `__key` method, really. – Martijn Pieters Sep 20 '12 at 13:08
  • @MartijnPieters -- Yes, you would definitely need to be *very* careful with `if`/`elif` logic. I suppose that if performance suffered from sorting because of a lot of attributes (or lots of calls to `__hash__`), you could always calculate the hash value at the end of `__init__` and have `__hash__` return that attribute as well. I suppose you're right. Premature optimization is almost always a bad idea ... – mgilson Sep 20 '12 at 13:13
  • As a side note, I didn't start this discussion to say that `__key` shouldn't use `sorted` (it should). I started it to inform OP about how dictionaries work ... I just let the discussion continue because I thought I might learn something about creating hashable classes since it's not something I do very often. – mgilson Sep 20 '12 at 13:15
1

You can do it if you assume conventions for your attributes. In your example, it would be quite trivial, as your attributes starts with "attr_" - so you coult write your __key method as :

def __key(self):
    return tuple (getattr(self, attr) for attr in self.__dict__ if attr.startswith("attr_") )

As you can see - any test you can find to put on the filter condition of the generator expression would fit your needs.

A suggestion I can give you is to have your classes using Python's __slots__ feature: that not only will make your attribute names easy to find, as will make your imutable objects more efficient to use and with smaller memory footprint.

class X:
    __slots__ = ("a", "b", "c")
    def __key(self):
        return tuple (getattr(self, attr) for attr in self.__class__.__slots__ )

edit Answering the first comment from the O.P.:

This works with inheritance, of course. If you will always use all of the object's attributes for they, you don't need the "if" part of the expression - write the function as _key (instead of __key which makes a unique name for each class internally) on a class on the top of your hierarchy, and it will work for all your classes.

jsbueno
  • 99,910
  • 10
  • 151
  • 209
  • If you are going to base this off `__dict__` why not just use `__dict__.items()` or `__dict__.values()` and grab the attribute values straight out of the instance dictionary? – Martijn Pieters Sep 20 '12 at 12:34
  • Thanks. I am, unfortunately, working with an existing body of code. I was hoping to make some of the classes hashable without changing them too much. I guess it's doable, but a bit time consuming. – max Sep 20 '12 at 12:35