6

I need to create a mapping from objects of my own custom class (derived from dict) to objects of another custom class. As I see it there are two ways of doing this:

  1. I can make the objects hashable. I'm not sure how I would do this. I know I can implement __hash__() but I'm unsure how to actually calculate the hash (which should be an integer).

  2. Since my objects can be compared I can make a list [(myobj, myotherobj)] and then implement a lookup which finds the tuple where the first item in the tuple is the same as the lookup key. Implementing this is trivial (the number of objects is small) but I want to avoid reinventing the wheel if something like this already exists in the standard library.

It seems to me that wanting to look up unhashables would be a common problem so I assume someone has already solved this problem. Any suggestions on how to implement __hash()__ for a dict-like object or if there is some other standard way of making lookup tables of unhashables?

pafcu
  • 7,808
  • 12
  • 42
  • 55
  • @delnan: I'm not planning on changing anything afterwards. Is there some way to mark a dict read-only? – pafcu Dec 16 '10 at 15:14
  • In that case, should it actually be a dict? Wouldn't it be better to make it own a dict that isn't modified anywhere except in `__init__`? Anyway, if the keys and values in the dict never change (for one instance of your dict), you could do it like [tuples](http://effbot.org/zone/python-hash.htm). –  Dec 16 '10 at 15:19
  • @pacfu: Overwrite `__delitem__()`, `__setitem()__`, `clear()`, `pop()`, `popitem()`, `setdefault()` and `update()` to make a dict immutable. Admittedly, not very elegant. – Sven Marnach Dec 16 '10 at 15:21

4 Answers4

3

Mappings with mutable objects as keys are generally difficult. Is that really what you want? If you consider your objects to be immutable (there is no way to really enforce immutability in Python), or you know they will not be changed while they are used as keys in a mapping, you can implement your own hash-function for them in several ways. For instance, if your object only has hashable data-members, you can return the hash of a tuple of all data-members as the objects hash.

If your object is a dict-like, you can use the hash of a frozenset of all key-value-pairs.

def __hash__(self):
    return hash(frozenset(self.iteritems()))

This only works if all values are hashable. In order to save recalculations of the hashes (which would be done on every lookup), you can cache the hash-value and just recalculate it if some dirty-flag is set.

Björn Pollex
  • 75,346
  • 28
  • 201
  • 283
  • Caching the hash has the added advantage of ensuring it doesn't change - even if the contents of the object change, if you never update the hash then it becomes effectively immutable. – Mark Ransom Dec 16 '10 at 15:30
  • @Mark: But this could break a fundamental rule of hashes, namely that *equal values should have equal hashes*. Only do this if you know exactly what your are getting yourself into. – Björn Pollex Dec 16 '10 at 15:37
  • 1
    @Mark: There is only requirement [`__hash__()`](http://docs.python.org/reference/datamodel.html#object.__hash__) must fulfil: If two objects compare equal, they must have the same hash. By following your proposal, this requirement would be violated. – Sven Marnach Dec 16 '10 at 15:38
  • @Sven Marnach, fair enough. Perhaps the proper behavior would be to raise an exception if the calculated hash ever becomes different than the cached one, but of course this requires recalculating the hash each time. – Mark Ransom Dec 16 '10 at 15:43
2

A simple solution seems to be to do lookup[id(myobj)] = myotherobj instead of lookup[myobj] = myotherobj. Any commente on this approach?

pafcu
  • 7,808
  • 12
  • 42
  • 55
  • 1
    If these are the semantics you want: fine. Note that one usually expects that `j == k` implies `lookup[j] == lookup[k]` -- a property you would lose with your approach. Equivalently, you could make `__hash__()` return `hash(id(self))` and `__eq__()` return `id(self) == id(other)`. – Sven Marnach Dec 16 '10 at 16:06
1

The following should work if you're not storing any additional unhashable objects in your custom class:

def __hash__(self):
    return hash(self.items())
Pär Wieslander
  • 28,374
  • 7
  • 55
  • 54
1

Here is an implementation of a frozendict, taken from http://code.activestate.com/recipes/414283/:

class frozendict(dict):
    def _blocked_attribute(obj):
        raise AttributeError, "A frozendict cannot be modified."
    _blocked_attribute = property(_blocked_attribute)

    __delitem__ = __setitem__ = clear = _blocked_attribute
    pop = popitem = setdefault = update = _blocked_attribute

    def __new__(cls, *args):
        new = dict.__new__(cls)
        dict.__init__(new, *args)
        return new

    def __init__(self, *args):
        pass

    def __hash__(self):
        try:
            return self._cached_hash
        except AttributeError:
            h = self._cached_hash = hash(tuple(sorted(self.items())))
            return h

    def __repr__(self):
        return "frozendict(%s)" % dict.__repr__(self)

I would replace tuple(sorted(self.items())) by frozenset(self.iteritems()) as in Spacecowboy's answer. And consider adding __slots__ = ("_cached_hash",) to the class.

Community
  • 1
  • 1
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841