0

I have a function, which given an argument calculates a corresponding value and returns it. Returned value of the function, depends only on its parameters, so I'd like to cache (memoize) the value somehow. Furthermore, I also want to be able to invalidate a cached value.

It seems to be a common need, so I am trying to avoid reinventing the wheel.

What I'm looking for is an advanced highly-configurable high-performance library (tool, framework, etc.) and would like the changes to be as lean as possible. Some good points are:

  • efficiently handling concurrent requests
  • being able to use different cache backends (e.g. RAM or DB)
  • retaining responsiveness on large scale data

What are some good libraries to use, and how are they compared?

semekh
  • 3,867
  • 2
  • 24
  • 34
  • 2
    Under what circumstances would you invalidate the cached value if it only depends on the input parameters? – Daenyth May 19 '13 at 16:40
  • 1
    http://stackoverflow.com/questions/1988804/what-is-memoization-and-how-can-i-use-it-in-python – Jakub M. May 19 '13 at 16:41
  • @Daenyth Cache invalidation will be performed only on demand (just in case the admin explicitly asks the value to be recalculated) – semekh May 19 '13 at 16:45
  • @JakubM. I'm asking for a more advanced library or framework. Performance is important, I'd like to avoid NIH syndrome (i.e. use available libraries and frameworks), and would like the changes to be as lean as possible (Edited to question to reflect the needs) – semekh May 19 '13 at 16:47
  • @JanneKarila What I'm looking for is an advanced configurable high-performance (probably based on some DB) solution. Could you please stop reminding me about how I should use decorators? – semekh May 19 '13 at 16:59

2 Answers2

3

You can use functools.lru_cache, a simple in-memory cache.

An example of cached function:

import functools

@functools.lru_cache()
def f(x, y):
    return x+y

print(f(7, 4))
11

Clear the whole cache:

f.cache_clear()

Clear cache for a particular value (dirty, dirty hack, because there is no direct access to cache dict):

def clear_cache_value(cached_function, *args, **kwargs):
    cache = next(c.cell_contents for c in cached_function.cache_info.__closure__ if isinstance(c.cell_contents, dict))
    del cache[functools._make_key(args, kwargs, False)]

clear_cache_value(f, 7, 4)
Oleh Prypin
  • 33,184
  • 10
  • 89
  • 99
  • I'm looking for an advanced solution, not just a solution. Please read the comments on the question. – semekh May 19 '13 at 17:06
  • @ChrisJohnson My question is clear enough, and has real-world application. Just that I don't accept unusable answers doesn't mean I am refusing to do so. – semekh May 19 '13 at 17:58
3

This question doesn't make much sense to me. When you start talking about "high performance" and "concurrent requests" , you're not really talking about using a Python library within an application -- it sounds more like using (or building) some sort of dedicated, external specialized service or daemon.

Personally, I use a mixture of memoization and 'lazy loaded' or 'deferred' properties to define cache gets (and computes). By 'lazy loaded', I mean that instead of always pulling (or computing) cached data, I create a proxy object that has all the information to call the get/create function from the cache on first access. when it comes to database backed material, i've also found it useful to group cache misses and consolidate cache gets - this allows me to load in parallel when possible ( instead of multiple serial requests ).

dogpile.cache takes care of my cache administration (get, set, invalidate), which is configured to store in memcached or dbm ( it allows for several backends ). i use two lightweight objects (12lines?) to handle the deferred gets.

Jonathan Vanasco
  • 15,111
  • 10
  • 48
  • 72