4

Python doesn't allow non-hashable objects to be used as keys in other dictionaries. As pointed out by Andrey Vlasovskikh, there is a nice workaround for the special case of using non-nested dictionaries as keys:

frozenset(a.items())#Can be put in the dictionary instead

Is there a method of using arbitrary objects as keys in dictionaries?

Example:

How would this be used as a key?

{"a":1, "b":{"c":10}}

It is extremely rare that you will actually have to use something like this in your code. If you think this is the case, consider changing your data model first.

Exact use case

The use case is caching calls to an arbitrary keyword only function. Each key in the dictionary is a string (the name of the argument) and the objects can be quite complicated, consisting of layered dictionaries, lists, tuples, ect.

Related problems

This sub-problem has been split off from the problem here. Solutions here deal with the case where the dictionaries is not layered.

Community
  • 1
  • 1
Casebash
  • 114,675
  • 90
  • 247
  • 350
  • If no-one answers this, then I do actually plan to implement this myself (see Chris Lutz's solution) and will post the solution here. However, feel free to answer – Casebash Oct 23 '09 at 07:41
  • Actually this problem is pretty nasty. You are almost guaranteed to obtain type clashes, ie. one type of dict being different from another :-( – Casebash Oct 23 '09 at 09:13
  • And just storing the type of the class doesn't work as contained types may have types too :-( – Casebash Oct 23 '09 at 09:15
  • 3
    -1: It's not "extremely rare". It's needless. Create your own proper class with a proper `__hash__` function. "arbitrary" shouldn't enter into the conversation. Just define a proper class and eliminate this problem. – S.Lott Oct 23 '09 at 10:50
  • In the end I just changed the data model, but I will get round to posting an answer for this... eventually – Casebash Jan 07 '10 at 11:17
  • 1
    I'm adding my use case: A function that performs like an in memory `memoize`. The problem with a standard `memoize` function is the overhead of talking to a db. If you use `memoize` in memory, there is no easy way to clear the cache when the timeout isn't static. – reubano Jan 26 '15 at 07:24

8 Answers8

7

Based off solution by Chris Lutz again.

import collections

def hashable(obj):
    if isinstance(obj, collections.Hashable):
        items = obj
    elif isinstance(obj, collections.Mapping):
        items = frozenset((k, hashable(v)) for k, v in obj.iteritems())
    elif isinstance(obj, collections.Iterable):
        items = tuple(hashable(item) for item in obj)
    else:
        raise TypeError(type(obj))

    return items
reubano
  • 5,087
  • 1
  • 42
  • 41
flycondor
  • 663
  • 1
  • 6
  • 8
  • This is nice, but it doesn't catch nested unhashables, e.g. `a = {hashable(tuple(([1],[2]))):1}` fails. To fix it, you need to actually `try` hashing, and if it raises, recurse on each item in the list. Specifically, add `try: a = {items:1} except TypeError: return tuple(hashable(elem) for elem in items)` – supergra Nov 08 '18 at 22:11
  • For Python 3, change `iteritems` to `items` – supergra Nov 08 '18 at 22:20
6

Don't. I agree with Andreys comment on the previous question that is doesn't make sense to have dictionaries as keys, and especially not nested ones. Your data-model is obviously quite complex, and dictionaries are probably not the right answer. You should try some OO instead.

Lennart Regebro
  • 167,292
  • 41
  • 224
  • 251
  • 2
    I don't agree that it should be a comment. In my opinion, this is the correct answer. YMMV of course. – Lennart Regebro Oct 23 '09 at 07:56
  • 2
    +1: Just don't. If you think you need this, then your "nested dictionary" should not have been a nested dictionary -- it should have been a proper class with a proper `__hash__` method. Do not write code to process "arbitrary" structures. Write proper classes instead of "arbitrary" structures. – S.Lott Oct 23 '09 at 10:49
  • 1
    Just answer the question. Plus: (1) Memoize decorators for arbitrary functions in scripts that only I use are *incredibly* productive. (2) It's not always "my data model", and adding the obstacle of having to wrap everything I do in a (probably buggy) class is far more obfuscating. It's fair, however, to say "If you do this in code that I have to use, I will hate you forever." :) – supergra Nov 08 '18 at 22:35
4

I agree with Lennart Regebro that you don't. However I often find it useful to cache some function calls, callable object and/or Flyweight objects, since they may use keyword arguments.

But if you really want it, try pickle.dumps (or cPickle if python 2.6) as a quick and dirty hack. It is much faster than any of the answers that uses recursive calls to make items immutable, and strings are hashable.

import pickle
hashable_str = pickle.dumps(unhashable_object)
koo
  • 2,888
  • 1
  • 23
  • 29
  • This is a perfect two-line solution! I disagree that this is a dirty hack. Although using the word `pickle` in your code is embarrassing, it is far simpler and cleaner than any of the recursive approaches, which try to handle every case -- and still don't work for complex inputs. `pickle` by definition handles virtually anything. Your code comment can be simply `# magic hashifier` as opposed to trying to explain all the vagaries of what can and can't be hashed, spread over 10-20 lines. – supergra Nov 08 '18 at 22:27
4

Based off solution by Chris Lutz. Note that this doesn't handle objects that are changed by iteration, such as streams, nor does it handle cycles.

import collections

def make_hashable(obj):
    """WARNING: This function only works on a limited subset of objects
    Make a range of objects hashable. 
    Accepts embedded dictionaries, lists or tuples (including namedtuples)"""
    if isinstance(obj, collections.Hashable):
        #Fine to be hashed without any changes
        return obj
    elif isinstance(obj, collections.Mapping):
        #Convert into a frozenset instead
        items=list(obj.items())
        for i, item in enumerate(items):
                items[i]=make_hashable(item)
        return frozenset(items)
    elif isinstance(obj, collections.Iterable):
        #Convert into a tuple instead
        ret=[type(obj)]
        for i, item in enumerate(obj):
                ret.append(make_hashable(item))
        return tuple(ret)
    #Use the id of the object
    return id(obj)
Casebash
  • 114,675
  • 90
  • 247
  • 350
  • And unlike Chris Lutz's solution (no offense Chris) it actually works. Brilliant. Bonuspoit for using isinstance so as to catch subclasses. – dan mackinlay Dec 16 '10 at 03:32
  • 2
    Some cases that just don't work: i = []; i.append(i); make_hashable(i) # max recursion; make_hashable({1:{}}) – mkorpela Jan 19 '11 at 14:34
  • Also doesn't catch nested things, e.g. `a = {make_hashable(tuple(([1],[2]))):1}` fails. To fix it, you need to actually `try` hashing, and if it `raise`s, recurse on each item in the list. – supergra Nov 08 '18 at 22:09
2

If you really must, make your objects hashable. Subclass whatever you want to put in as a key, and provide a __hash__ function which returns an unique key to this object.

To illustrate:

>>> ("a",).__hash__()
986073539
>>> {'a': 'b'}.__hash__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not callable

If your hash is not unique enough you will get collisions. May be slow as well.

pi.
  • 21,112
  • 8
  • 38
  • 59
  • Not necessarily the best way in general. In my case I get dictionaries back from a 3rd party library. I have to use them as keys. I can't ensure they happen to correspond to any given class. Nor do I wish to use them to create temporary objects just to get create the hash, since it's a once-only operation here. I'd rather call make_hashable, as per Chris Lutz's solution above, once, rather than mess around with custom classes. – dan mackinlay Dec 16 '10 at 03:44
2

I totally disagree with comments & answers saying that this shouldn't be done for data model purity reason.

A dictionary associates an object with another object using the former one as a key. Dictionaries can't be used as keys because they're not hashable. This doesn't make any less meaningful/practical/necessary to map dictionaries to other objects.

As I understand the Python binding system, you can bind any dictionary to a number of variables (or the reverse, depends on your terminology) which means that these variables all know the same unique 'pointer' to that dictionary. Wouldn't it be possible to use that identifier as the hashing key ? If your data model ensures/enforces that you can't have two dictionaries with the same content used as keys then that seems to be a safe technique to me.

I should add that I have no idea whatsoever of how that can/should be done though.

I'm not entirely whether this should be an answer or a comment. Please correct me if needed.

Laurent Giroud
  • 207
  • 2
  • 9
  • Thanks, nekoniaow, now incorporated this into my solution. – Casebash Jan 07 '10 at 11:44
  • Do not forget: They are intentionally not hashable because they are mutable. – pi. Mar 04 '11 at 09:14
  • 1
    So what? A dictionary is an associative mapping from one object to another. You may want this mapping regardless of the object content's mutability. If A needs to be associated to B then we need not care that A can change, the mapping doesn't need to depend solely on this. – Laurent Giroud Mar 05 '11 at 02:46
1

With recursion!

def make_hashable(h):
    items = h.items()
    for item in items:
        if type(items) == dict:
            item = make_hashable(item)
    return frozenset(items)

You can add other type tests for any other mutable types you want to make hashable. It shouldn't be hard.

Community
  • 1
  • 1
Chris Lutz
  • 73,191
  • 16
  • 130
  • 183
  • Actually, I was right in thinking that it was a bit more complex. Need special code for handling tuples, lists, sets... Will probably actually take significant effort to solve this properly. I should also rename the question – Casebash Oct 23 '09 at 07:35
  • Also, I think is should be type(item)==dict – Casebash Oct 23 '09 at 08:49
  • Also setting item to make_hashable(item) doesn't set it in the list – Casebash Oct 23 '09 at 08:52
  • This is an ugly hack which distracts from the meaning of the code. – pi. Oct 23 '09 at 09:02
1

I encountered this issue when using a decorator that caches the results of previous calls based on call signature. I do not agree with the comments/answers here to the effect of "you should not do this", but I think it is important to recognize the potential for surprising and unexpected behavior when going down this path. My thought is that since instances are both mutable and hashable, and it does not seem practical to change that, there is nothing inherently wrong with creating hashable equivalents of non-hashable types or objects. But of course that is only my opinion.

For anyone who requires Python 2.5 compatibility, the below may be useful. I based it on the earlier answer.

from itertools import imap
tuplemap = lambda f, data: tuple(imap(f, data))
def make_hashable(obj):
  u"Returns a deep, non-destructive conversion of given object to an equivalent hashable object"
  if isinstance(obj, list):
    return tuplemap(make_hashable, iter(obj))
  elif isinstance(obj, dict):
    return frozenset(tuplemap(make_hashable, obj.iteritems()))
  elif hasattr(obj, '__hash__') and callable(obj.__hash__):
    try:
      obj.__hash__()
    except:
      if hasattr(obj, '__iter__') and callable(obj.__iter__):
        return tuplemap(make_hashable, iter(obj))
      else:
        raise NotImplementedError, 'object of type %s cannot be made hashable' % (type(obj),)
    else:
      return obj
  elif hasattr(obj, '__iter__') and callable(obj.__iter__):
    return tuplemap(make_hashable, iter(obj))
  else:
    raise NotImplementedError, 'object of type %s cannot be made hashable' % (type, obj)
wberry
  • 18,519
  • 8
  • 53
  • 85