4

I am looking for a way of building a decorator @memoize that I can use in functions as follows:

@memoize
my_function(a, b, c):
    # Do stuff 
    # result may not always be the same for fixed (a,b,c)
return result

Then, if I do:

result1 = my_function(a=1,b=2,c=3)
# The function f runs (slow). We cache the result for later

result2 = my_function(a=1, b=2, c=3)
# The decorator reads the cache and returns the result (fast)

Now say that I want to force a cache update:

result3 = my_function(a=1, b=2, c=3, force_update=True)
# The function runs *again* for values a, b, and c. 

result4 = my_function(a=1, b=2, c=3)
# We read the cache

At the end of the above, we always have result4 = result3, but not necessarily result4 = result, which is why one needs an option to force the cache update for the same input parameters.

How can I approach this problem?

Note on joblib

As far as I know joblib supports .call, which forces a re-run, but it does not update the cache.

Follow-up on using klepto:

Is there any way to have klepto (see @Wally's answer) cache its results by default under a specific location? (e.g. /some/path/) and share this location across multiple functions? E.g. I would like to say

cache_path = "/some/path/"

and then @memoize several functions in a given module under the same path.

Community
  • 1
  • 1
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
  • 1
    `functools.lru_cache` is a memoization decorator that provides a way to clear the entire cache (but not a specific call like in your example). – interjay Feb 04 '15 at 00:15

4 Answers4

4

I would suggest looking at joblib and klepto. Both have very configurable caching algorithms, and may do what you want.

Both definitely can do the caching for result1 and result2, and klepto provides access to the cache, so one can pop a result from the local memory cache (without removing it from a stored archive, say in a database).

>>> import klepto
>>> from klepto import lru_cache as memoize
>>> from klepto.keymaps import hashmap
>>> hasher = hashmap(algorithm='md5')
>>> @memoize(keymap=hasher)
... def squared(x):
...   print("called")
...   return x**2
... 
>>> squared(1)
called
1
>>> squared(2)
called
4
>>> squared(3)
called
9
>>> squared(2)
4
>>> 
>>> cache = squared.__cache__()
>>> # delete the 'key' for x=2
>>> cache.pop(squared.key(2))
4
>>> squared(2)
called
4

Not exactly the keyword interface you were looking for, but it has the functionality you are looking for.

  • Thanks Wally. I already use `joblib` and couldn't find a way of doing what I ask (the closest thing is `.call`, but as I mention in my note, it doesn't do what I need). I will look into `klepto`. – Amelio Vazquez-Reina Feb 04 '15 at 00:19
  • 1
    I see your edit. I believe `klepto` does what you are looking for as it provides a `dict` interface to any cache, and also provides a `pop` method for the cache -- it provides all `dict` methods, actually. –  Feb 04 '15 at 00:24
  • This is very helpful. Thanks Wally. It looks like a more complete library for memoization specifically than joblib. Great to know. On this note, do you happen to know how to do the above with **disk** persistance? – Amelio Vazquez-Reina Feb 04 '15 at 00:48
  • 2
    I believe that `klepto.archives.dir_archive` creates an archive on disk with one entry per file, while `klepto.archives.file_archive` creates an archive on disk with all entries in a single file. Additionally, you can use one of the `SQL` archives to write to disk. –  Feb 04 '15 at 01:39
  • 1
    See http://stackoverflow.com/a/21447994/4482921. I also remember another better example on SO, but can't seem to find the link offhand. –  Feb 04 '15 at 01:42
  • I have added a note to the OP to fully clarify this. I have notified Mike McKearns in case he can shed some light on this. – Amelio Vazquez-Reina Feb 04 '15 at 02:17
2

You can do something like this:

import cPickle


def memoize(func):
    cache = {}

    def decorator(*args, **kwargs):
        force_update = kwargs.pop('force_update', None)
        key = cPickle.dumps((args, kwargs))
        if force_update or key not in cache:
            res = func(*args, **kwargs)
            cache[key] = res
        else:
            res = cache[key]
        return res
    return decorator

The decorator accepts extra parameter force_update (you don't need to declare it in your function). It pops it from kwargs. So it you did't call the function with these parameters OR you are passing force_update = True the function will be called:

@memoize
def f(a=0, b=0, c=0):
    import random
    return [a, b, c, random.randint(1, 10)]


>>> print f(a=1, b=2, c=3)
[1, 2, 3, 9]
>>> print f(a=1, b=2, c=3) # value will be taken from the cache
[1, 2, 3, 9]
>>> print f(a=1, b=2, c=3, force_update=True)
[1, 2, 3, 2]
>>> print f(a=1, b=2, c=3) # value will be taken from the cache as well
[1, 2, 3, 2]
Pavel Reznikov
  • 2,968
  • 1
  • 18
  • 17
1

If you want to do it yourself:

def memoize(func):
    cache = {}
    def cacher(a, b, c, force_update=False):
        if force_update or (a, b, c) not in cache:
            cache[(a, b, c)] = func(a, b, c)
        return cache[(a, b, c)]
    return cacher
horns
  • 1,843
  • 1
  • 19
  • 26
1

This is purely with regard to the follow-up question for klepto

The flowing will extend @Wally's example to specify a directory:

>>> import klepto
>>> from klepto import lru_cache as memoize
>>> from klepto.keymaps import hashmap
>>> from klepto.archives import dir_archive
>>> hasher = hashmap(algorithm='md5')
>>> dir_cache = dir_archive('/tmp/some/path/squared')
>>> dir_cache2 = dir_archive('/tmp/some/path/tripled')
>>> @memoize(keymap=hasher, cache=dir_cache)
... def squared(x):
...   print("called")
...   return x**2
>>> 
>>> @memoize(keymap=hasher, cache=dir_cache2)
... def tripled(x):
...   print('called')
...   return 3*x
>>>

You could alternately use a file_archive, where you specify the path as:

cache = file_archive('/tmp/some/path/file.py') 
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • Great. Thanks @Mike! I will read the docs, but what are the assumptions behind the particular keymap you used? i.e. `keymap=hashmap(algorithm='md5')` – Amelio Vazquez-Reina Feb 04 '15 at 02:58
  • I'm not sure what you mean by "assumptions"... but maybe this link will clarify: https://github.com/uqfoundation/klepto/blob/master/klepto/keymaps.py, and the set of "encodings" can be found here: https://github.com/uqfoundation/klepto/blob/master/klepto/crypto.py I used the keymap that was being used in the given example. Let me know if the above doesn't answer what you were asking. – Mike McKerns Feb 04 '15 at 03:55