1

The common memoization recipes (like this or these) use dict to store the cache, and therefore require that the function arguments be hashable.

I want the function to work with as many different argument types as possible, and certainly including dict, set, list. What is the best way to achieve that?

One approach I was considering is to wrap all non-hashable arguments into their hashable subclasses (i.e., define a subclass of dict that defines its own __hash__ function).

Alternatively, I was thinking to create a subclass of dict that relies on a different hash function than hash (it's not too hard to define a global my_hash function that recursively works on containers), and use this subclass to store the cache. But I don't think there's an easy way to achieve do that.

EDIT:

I think I will try the solution that I suggested for general hashing of python containers. With that, I should be able to wrap the tuple of (*args, **kwargs) into the automatically hashable class, and use the regular memoization.

Community
  • 1
  • 1
max
  • 49,282
  • 56
  • 208
  • 355
  • Take the called function itself then use that as a hash? Will require a bit of introspection and checking the call stack... but should be doable I think. (Not something I'm going to attempt at 2:30am though :p) – Jon Clements Apr 13 '15 at 01:27
  • @JonClements interesting, I can try. But won't I have the same issue, since the introspection will return nonhashable objects? – max Apr 13 '15 at 01:38
  • @JonClements: Wouldn't that defeat the entire point of memoization? – user2357112 Apr 13 '15 at 02:12
  • Any way you do this, you're either going to have to deep copy the input, or you're going to have to require that nothing passed to your memoized function is ever mutated after it goes in the memo. I suspect having the user of the memoization decorator do the work of putting the function and its arguments into hash-friendly form is going to be overall better than trying to accept non-hashable arguments. – user2357112 Apr 13 '15 at 02:18
  • I don't know how I didn't see that this question [has already been asked](http://stackoverflow.com/questions/4669391/python-anyone-have-a-memoizing-decorator-that-can-handle-unhashable-arguments?rq=1) – max Aug 04 '16 at 22:46

2 Answers2

1

Method 1 (splitting keys and values)

This is based on the idea that dictionaries are just zipped keys and values. With this idea, we can make something like a dictionary to store keys (function arguments) and values (returned values from the function).

Not sure how slow it will be since it uses list.index. Maybe zipping would be faster?

class memoize:
    def __init__(self, func):
        self.func = func
        self.known_keys = []
        self.known_values = []

    def __call__(self, *args, **kwargs):
        key = (args, kwargs)

        if key in self.known_keys:
            i = self.known_keys.index(key)
            return self.known_values[i]
        else:
            value = self.func(*args, **kwargs)
            self.known_keys.append(key)
            self.known_values.append(value)

            return value

It works!:

>>> @memoize
... def whatever(unhashable):
...     print(*unhashable) # Just to know when called for this example
...     return 12345
...
>>> whatever([1, 2, 3, 4])
1 2 3 4
12345
>>> whatever([1, 2, 3, 4])
12345
>>> whatever({"a": "b", "c": "d"})
a c
12345
>>> whatever({"a": "b", "c": "d"})
12345

Method 2 (fake hashes)

class memoize:
    def __init__(self, func):
        self.func = func
        self.known = {}

    def __call__(self, *args, **kwargs):
        key = give_fake_hash((args, kwargs))

        try:
            return self.known[key]
        except KeyError:
            value = self.func(*args, **kwargs)
            self.known[key] = value
            return value

def give_fake_hash(obj):
    cls = type(obj)
    name = "Hashable" + cls.__name__

    def fake_hash(self):
        return hash(repr(self))

    t = type(name, (cls, ), {"__hash__": fake_hash})

    return t(obj)

Method 2.5 (working for dicts)

import operator

class memoize:
    def __init__(self, func):
        self.func = func
        self.known = {}

    def __call__(self, *args, **kwargs):
        key = give_fake_hash((args, kwargs))

        try:
            return self.known[key]
        except KeyError:
            value = self.func(*args, **kwargs)
            self.known[key] = value
            return value

def fake_hash(self):
    return hash(repr(self))

class HashableTuple(tuple):
    __hash__ = fake_hash

class RereprDict(dict):
    def __repr__(self):
        try:
            self._cached_repr
        except AttributeError:
            self._cached_repr = repr(sorted(self.items(), key=operator.itemgetter(0)))
        finally:
            return self._cached_repr

    __hash__ = fake_hash

def fix_args(args):
    for elem in args:
        if isinstance(elem, dict):
            elem = RereprDict(elem)
        yield elem

def give_fake_hash(tup):
    args, kwargs = tup

    args = tuple(fix_args(args))
    kwargs = RereprDict(kwargs)

    return HashableTuple((args, kwargs))
Navith
  • 929
  • 1
  • 9
  • 15
  • Yes it would work, but since memoization is used only when performance is an issue, I'd prefer to keep the benefits of O(1), and highly optimized, python dict. – max Apr 13 '15 at 01:44
  • See method 2 to keep the speed of dicts. – Navith Apr 13 '15 at 03:57
  • `fake_hash` won't work correctly for dictionaries because two dictionaries that evaluate as `==` may show their contents in different order using `repr`, and thus end up with different hashes. – max Apr 14 '15 at 02:35
  • Well, see method 2.5. At this point it's probably not even worth memoizing. – Navith Apr 14 '15 at 04:35
  • Even with `repr(sorted(self.items()))`, you're not guaranteed that the hash is valid, since a dictionary of sets (for example) could still print the values in an arbitrary order... It seems like this is a very annoying problem to solve, I wish python provided some built-in support for this. – max Apr 14 '15 at 10:26
  • Well then, I'm out of ideas besides recursively making every dict in `args` a `RereprDict`, and making `Rerepr*` for every unordered iterable. This is a complex problem. – Navith Apr 15 '15 at 01:38
  • To add another problem, even the keys may not be in the stable order... if they are of type `frozenset`, sorting several of them is unstable, since many pairs of such keys return `False` when compared for inequality in *either* direction. In other words, yes, this is unfortunately a very complex problem. – max Apr 15 '15 at 11:30
1

There's a reason dict/list/set/etc. are not hashable, and it is they are mutable.

That's one of the main reasons their immutable counterparts exist (frozendict/frozenset/tuple). (Well, tuple isn't exactly a frozen-list, but in practice it serves the purpose).

Therefor, for your purpose, use the immutable alternatives.

Here's a quick demonstration of why you shouldn't hash mutable objects. Bare in mind that hashing requires that a==b ==> hash(a)==hash(b).

@memoize
def f(x): ...

d1 = {'a':5}
d2 = {'a':99}
res1 = f(d1)
res2 = f(d2)
d1['a'] = 99
res3 = f(d1)  # what should be returned? not well defined...
shx2
  • 61,779
  • 13
  • 130
  • 153
  • Hashing immutable values only becomes problem, IMO, when you modify objects that are already stored in a hash-based data structure. For example, `d = {d1, d2}`, then modify `d1`: `d1` would end up in the wrong bucket, essentially corrupting the set data structure. That, AFAIK, is the sole reason that mutable values aren't hashable in python. But in your example (and my use case), there seems to be no problem. Of course, `f(d1)` after the change should return the same as `f(d2)` - with or without memoization. And memoization should correctly identify the current value of `d1` as equal to `d2`. – max Apr 14 '15 at 01:57
  • @max right. the only way to achieve that is "freezing" the key before storing it in the cache, which is done by using e.g. a frozendict. – shx2 Apr 14 '15 at 05:35
  • well, if you really want to make sure that no one messes up the key, you're right. but in some cases, it's ok to trust yourself and/or other developers to be careful, and then you are perfectly fine using and hashing dictionary. – max Apr 14 '15 at 10:23