39

I have a method that takes (among others) a dictionary as an argument. The method is parsing strings and the dictionary provides replacements for some substrings, so it doesn't have to be mutable.

This function is called quite often, and on redundant elements so I figured that caching it would improve its efficiency.

But, as you may have guessed, since dict is mutable and thus not hashable, @functools.lru_cache can't decorate my function. So how can I overcome this?

Bonus point if it needs only standard library classes and methods. Ideally if it exists some kind of frozendict in standard library that I haven't seen it would make my day.

PS: namedtuple only in last resort, since it would need a big syntax shift.

Evpok
  • 4,273
  • 3
  • 34
  • 46
  • Maybe this can help: http://stackoverflow.com/questions/4669391/python-anyone-have-a-memoizing-decorator-that-can-handle-unhashable-arguments – mouad Jun 15 '11 at 13:36
  • I hadn't seen this, but it doesn't really help. Writing a cache decorator from scratch isn't worth the effort here and I'd like to stick to the standard library. Thank you anyway :) – Evpok Jun 15 '11 at 13:40
  • How about subclassing `namedtuple` and add access by `x["key"]`? This will probably be just a few lines of code. – Sven Marnach Jun 15 '11 at 13:42
  • The only way I know to get named tuples is by calling the factory `collections.namedtuple` which returns a `type`, so if I want to add a `__getitem__` to a named tuple I'd have to do it dynamically, which shouldn't be possible, and even if it is is really ugly. Is there another way to do this? – Evpok Jun 15 '11 at 13:59
  • 1
    @Evpok: Just subclass the type returned by `namedtuple()`: `class X(namedtuple("Y", "a b c")): ...`. – Sven Marnach Jun 15 '11 at 14:32
  • @Sven Marnach Hadn't thought of this. Nice :) – Evpok Jun 15 '11 at 14:38

9 Answers9

32

Instead of using a custom hashable dictionary, use this and avoid reinventing the wheel! It's a frozen dictionary that's all hashable.

https://pypi.org/project/frozendict/

Code:

from frozendict import frozendict

def freezeargs(func):
    """Transform mutable dictionnary
    Into immutable
    Useful to be compatible with cache
    """

    @functools.wraps(func)
    def wrapped(*args, **kwargs):
        args = tuple([frozendict(arg) if isinstance(arg, dict) else arg for arg in args])
        kwargs = {k: frozendict(v) if isinstance(v, dict) else v for k, v in kwargs.items()}
        return func(*args, **kwargs)
    return wrapped

and then

@freezeargs
@lru_cache
def func(...):
    pass

Code taken from @fast_cen 's answer

Note: this does not work on recursive datastructures; for example, you might have an argument that's a list, which is unhashable. You are invited to make the wrapping recursive, such that it goes deep into the data structure and makes every dict frozen and every list tuple.

(I know that OP nolonger wants a solution, but I came here looking for the same solution, so leaving this for future generations)

Cedar
  • 748
  • 6
  • 21
22

Here is a decorator that use @mhyfritz trick.

def hash_dict(func):
    """Transform mutable dictionnary
    Into immutable
    Useful to be compatible with cache
    """
    class HDict(dict):
        def __hash__(self):
            return hash(frozenset(self.items()))

    @functools.wraps(func)
    def wrapped(*args, **kwargs):
        args = tuple([HDict(arg) if isinstance(arg, dict) else arg for arg in args])
        kwargs = {k: HDict(v) if isinstance(v, dict) else v for k, v in kwargs.items()}
        return func(*args, **kwargs)
    return wrapped

Simply add it before your lru_cache.

@hash_dict
@functools.lru_cache()
def your_function():
    ...
fast_cen
  • 1,297
  • 3
  • 11
  • 28
  • Doesn't let you call `clear_cache()` or `cache_info()` on the decorated function. – Nicholas Tulach Oct 04 '17 at 17:19
  • 8
    To provide `clear_cache()` and `cache_info()` calls, just add these functions on `wrapped` before the return. Something like `wrapper.cache_info = func.cache_info` and `wrapper.cache_clear = func.cache_clear` – rcsalvador May 28 '18 at 23:12
11

What about creating a hashable dict class like so:

class HDict(dict):
    def __hash__(self):
        return hash(frozenset(self.items()))

substs = HDict({'foo': 'bar', 'baz': 'quz'})
cache = {substs: True}
mhyfritz
  • 8,342
  • 2
  • 29
  • 29
  • 2
    Works like a charm, though only if the items of the dict are all hashables, but this is my case. However, do you have some trick to deal with non-hashables `self.items()`? – Evpok Jun 15 '11 at 14:49
  • 3
    No _easy_ way comes to my mind right now. You could of course recurse down the dict and convert immutables along the way (dicts to frozensets, lists to tuples etc.)... – mhyfritz Jun 15 '11 at 15:00
3

How about subclassing namedtuple and add access by x["key"]?

class X(namedtuple("Y", "a b c")):
    def __getitem__(self, item):
        if isinstance(item, int):
            return super(X, self).__getitem__(item)
        return getattr(self, item)
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • Nice too, but since `keys` are to be user-defined it would force me to define this as inner class to my calling method and I want to avoid that. Great idea, though, and probably useful somewhen. – Evpok Jun 15 '11 at 15:16
3

Based on @Cedar answer, adding recursion for deep freeze as suggested:

def deep_freeze(thing):
    from collections.abc import Collection, Mapping, Hashable
    from frozendict import frozendict
    if thing is None or isinstance(thing, str):
        return thing
    elif isinstance(thing, Mapping):
        return frozendict({k: deep_freeze(v) for k, v in thing.items()})
    elif isinstance(thing, Collection):
        return tuple(deep_freeze(i) for i in thing)
    elif not isinstance(thing, Hashable):
        raise TypeError(f"unfreezable type: '{type(thing)}'")
    else:
        return thing


def deep_freeze_args(func):
    import functools

    @functools.wraps(func)
    def wrapped(*args, **kwargs):
        return func(*deep_freeze(args), **deep_freeze(kwargs))
    return wrapped
Boris Lopez
  • 516
  • 8
  • 14
2

Here's a decorator that can be used like functools.lru_cache. But this is targetted at functions that take only one argument which is a flat mapping with hashable values and has a fixed maxsize of 64. For your use-case you would have to adapt either this example or your client code. Also, to set the maxsize individually, one had to implement another decorator, but i haven't wrapped my head around this since i don't needed it.

from functools import (_CacheInfo, _lru_cache_wrapper, lru_cache,
                       partial, update_wrapper)
from typing import Any, Callable, Dict, Hashable

def lru_dict_arg_cache(func: Callable) -> Callable:
    def unpacking_func(func: Callable, arg: frozenset) -> Any:
        return func(dict(arg))

    _unpacking_func = partial(unpacking_func, func)
    _cached_unpacking_func = \
        _lru_cache_wrapper(_unpacking_func, 64, False, _CacheInfo)

    def packing_func(arg: Dict[Hashable, Hashable]) -> Any:
        return _cached_unpacking_func(frozenset(arg.items()))

    update_wrapper(packing_func, func)
    packing_func.cache_info = _cached_unpacking_func.cache_info
    return packing_func


@lru_dict_arg_cache
def uppercase_keys(arg: dict) -> dict:
    """ Yelling keys. """
    return {k.upper(): v for k, v in arg.items()}


assert uppercase_keys.__name__ == 'uppercase_keys'
assert uppercase_keys.__doc__ == ' Yelling keys. '
assert uppercase_keys({'ham': 'spam'}) == {'HAM': 'spam'}
assert uppercase_keys({'ham': 'spam'}) == {'HAM': 'spam'}
cache_info = uppercase_keys.cache_info()
assert cache_info.hits == 1
assert cache_info.misses == 1
assert cache_info.maxsize == 64
assert cache_info.currsize == 1
assert uppercase_keys({'foo': 'bar'}) == {'FOO': 'bar'}
assert uppercase_keys({'foo': 'baz'}) == {'FOO': 'baz'}
cache_info = uppercase_keys.cache_info()
assert cache_info.hits == 1
assert cache_info.misses == 3
assert cache_info.currsize == 3

For a more generic approach one could use the decorator @cachetools.cache from a third-party library with an appropriate function set as key.

funky-future
  • 3,716
  • 1
  • 30
  • 43
2

After deciding to drop lru cache for our use case for now, we still came up with a solution. This decorator uses json to serialise and deserialise the args/kwargs sent to the cache. Works with any number of args. Use it as a decorator on a function instead of @lru_cache. max size is set to 1024.

def hashable_lru(func):
    cache = lru_cache(maxsize=1024)

    def deserialise(value):
        try:
            return json.loads(value)
        except Exception:
            return value

    def func_with_serialized_params(*args, **kwargs):
        _args = tuple([deserialise(arg) for arg in args])
        _kwargs = {k: deserialise(v) for k, v in kwargs.items()}
        return func(*_args, **_kwargs)

    cached_function = cache(func_with_serialized_params)

    @wraps(func)
    def lru_decorator(*args, **kwargs):
        _args = tuple([json.dumps(arg, sort_keys=True) if type(arg) in (list, dict) else arg for arg in args])
        _kwargs = {k: json.dumps(v, sort_keys=True) if type(v) in (list, dict) else v for k, v in kwargs.items()}
        return cached_function(*_args, **_kwargs)
    lru_decorator.cache_info = cached_function.cache_info
    lru_decorator.cache_clear = cached_function.cache_clear
    return lru_decorator
Harel
  • 1,989
  • 3
  • 26
  • 44
1

The solution might be much simpler. The lru_cache uses parameters as the identifier for caching, so in the case of the dictionary, lru_cache doesn't know how to interpret it. You can serialize dictionary parameter to string and unserialize in the function to the dictionary back. Works like a charm.

the function:

    @lru_cache(1024)
    def data_check(serialized_dictionary):
        my_dictionary = json.loads(serialized_dictionary)
        print(my_dictionary)

the call:

    data_check(json.dumps(initial_dictionary))
Gregory R
  • 11
  • 1
0

An extension of @Cedar answer, adding recursive freeze:

the recursive freeze:

def recursive_freeze(value):
    if isinstance(value, dict):
        for k,v in value.items():
            value[k] = recursive_freeze(v)
        return frozendict(value)
    else:
        return value

# To unfreeze
def recursive_unfreeze(value):
    if isinstance(value, frozendict):
        value = dict(value)
        for k,v in value.items():
            value[k] = recursive_unfreeze(v)
    
    return value

the decorator:

def freezeargs(func):
    """
    Transform mutable dictionnary into immutable.
    Useful to be compatible with cache
    """

    @functools.wraps(func)
    def wrapped(*args, **kwargs):
        args = tuple([recursive_freeze(arg) if isinstance(arg, dict) else arg for arg in args])
        kwargs = {k: recursive_freeze(v) if isinstance(v, dict) else v for k, v in kwargs.items()}
        return func(*args, **kwargs)
    return wrapped