4

I am fairly new to the concepts of caching & memoization. I've read some other discussions & resources here, here, and here, but haven't been able to follow them all that well.

Say that I have two member functions within a class. (Simplified example below.) Pretend that the first function total is computationally expensive. The second function subtotal is computationally simple, except that it uses the return from the first function, and so also becomes computationally expensive because of this, in that it currently needs to re-call total to get its returned result.

I want to cache the results of the first function and use this as the input to the second, if the input y to subtotal shares the input x to a recent call of total. That is:

  • If calling subtotal() where y is equal to the value of x in a
    previous call of total, then use that cached result instead of
    re-calling total.
  • Otherwise, just call total() using x = y.

Example:

class MyObject(object):

    def __init__(self, a, b):
        self.a, self.b = a, b

    def total(self, x):
        return (self.a + self.b) * x     # some time-expensive calculation

    def subtotal(self, y, z):
        return self.total(x=y) + z       # Don't want to have to re-run total() here
                                         # IF y == x from a recent call of total(),
                                         # otherwise, call total().
Community
  • 1
  • 1
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
  • Have you tried this: http://stackoverflow.com/a/18723434/2570677. I've used it in my code and it works perfectly. – Liran Funaro May 19 '17 at 19:31
  • Assuming you're referring to `@functools.lru_cache`? – Brad Solomon May 19 '17 at 19:38
  • from the resources you link to, what is stopping you from just decorating `total` with a basic cache feature? you'd just put `@functools.lru_cache(maxsize=N)` and it'd cache `N` results for the same arguments. Why is that not going to work in your scenario? – Tadhg McDonald-Jensen May 19 '17 at 19:39
  • @BradSolomon I'm referring to the answer containing the implementation (without any external module). It works in python 2.7. – Liran Funaro May 19 '17 at 19:45

3 Answers3

2

With Python3.2 or newer, you could use functools.lru_cache. If you were to decorate the total with functools.lru_cache directly, then the lru_cache would cache the return values of total based on the value of both arguments, self and x. Since lru_cache's internal dict stores a reference to self, applying @lru_cache directly to a class method creates a circular reference to self which makes instances of the class un-dereferencable (hence a memory leak).

Here is a workaround which allows you to use lru_cache with class methods -- it caches results based on all arguments other than the first one, self, and uses a weakref to avoid the circular reference problem:

import functools
import weakref

def memoized_method(*lru_args, **lru_kwargs):
    """
    https://stackoverflow.com/a/33672499/190597 (orly)
    """
    def decorator(func):
        @functools.wraps(func)
        def wrapped_func(self, *args, **kwargs):
            # We're storing the wrapped method inside the instance. If we had
            # a strong reference to self the instance would never die.
            self_weak = weakref.ref(self)
            @functools.wraps(func)
            @functools.lru_cache(*lru_args, **lru_kwargs)
            def cached_method(*args, **kwargs):
                return func(self_weak(), *args, **kwargs)
            setattr(self, func.__name__, cached_method)
            return cached_method(*args, **kwargs)
        return wrapped_func
    return decorator


class MyObject(object):

    def __init__(self, a, b):
        self.a, self.b = a, b

    @memoized_method()
    def total(self, x):
        print('Calling total (x={})'.format(x))
        return (self.a + self.b) * x


    def subtotal(self, y, z):
        return self.total(x=y) + z 

mobj = MyObject(1,2)
mobj.subtotal(10, 20)
mobj.subtotal(10, 30)

prints

Calling total (x=10)

only once.


Alternatively, this is how you could roll your own cache using a dict:

class MyObject(object):

    def __init__(self, a, b):
        self.a, self.b = a, b
        self._total = dict()

    def total(self, x):
        print('Calling total (x={})'.format(x))
        self._total[x] = t = (self.a + self.b) * x
        return t

    def subtotal(self, y, z):
        t = self._total[y] if y in self._total else self.total(y)
        return t + z 

mobj = MyObject(1,2)
mobj.subtotal(10, 20)
mobj.subtotal(10, 30)

One advantage of lru_cache over this dict-based cache is that the lru_cache is thread-safe. The lru_cache also has a maxsize parameter which can help protect against memory usage growing without bound (for example, due to a long-running process calling total many times with different values of x).

Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
1

Thank you all for the responses, it was helpful just to read them and see what's going on under the hood. As @Tadhg McDonald-Jensen said, it seems like I didn't need anything more here than @functools.lru_cache. (I'm in Python 3.5.) Regarding @unutbu's comment, I'm not getting an error from decorating total() with @lru_cache. Let me correct my own example, I'll keep this up here for other beginners:

from functools import lru_cache
from datetime import datetime as dt

class MyObject(object):
    def __init__(self, a, b):
        self.a, self.b = a, b

    @lru_cache(maxsize=None)
    def total(self, x):        
        lst = []
        for i in range(int(1e7)):
            val = self.a + self.b + x    # time-expensive loop
            lst.append(val)
        return np.array(lst)     

    def subtotal(self, y, z):
        return self.total(x=y) + z       # if y==x from a previous call of
                                         # total(), used cached result.

myobj = MyObject(1, 2)

# Call total() with x=20
a = dt.now()
myobj.total(x=20)
b = dt.now()
c = (b - a).total_seconds()

# Call subtotal() with y=21
a2 = dt.now()
myobj.subtotal(y=21, z=1)
b2 = dt.now()
c2 = (b2 - a2).total_seconds()

# Call subtotal() with y=20 - should take substantially less time
# with x=20 used in previous call of total().
a3 = dt.now()
myobj.subtotal(y=20, z=1)
b3 = dt.now()
c3 = (b3 - a3).total_seconds()

print('c: {}, c2: {}, c3: {}'.format(c, c2, c3))
c: 2.469753, c2: 2.355764, c3: 0.016998
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
  • Do `self.a` and `self.b` ever change? If so, the cached value should be cleared since the computed value of `total` would change. You could implement that by making `a` and `b` settable properties whose setter calls [`total.cache_clear()`](http://stackoverflow.com/a/37654201/190597). – unutbu May 19 '17 at 20:51
  • PS: I was wrong about applying lru_cache to a class method raising an error. Although there is no error, [it does cause a memory leak](http://stackoverflow.com/q/33672412/190597) however. – unutbu May 19 '17 at 20:53
  • Here `self.a` and `self.b` won't be modified. But thanks, that's helpful to know. – Brad Solomon May 19 '17 at 21:04
0

In this case I would do something simple, maybe is not the most elegant way, but works for the problem:

class MyObject(object):
    param_values = {}
    def __init__(self, a, b):
        self.a, self.b = a, b

    def total(self, x):
        if x not in MyObject.param_values:
          MyObject.param_values[x] = (self.a + self.b) * x
          print(str(x) + " was never called before")
        return MyObject.param_values[x]

    def subtotal(self, y, z):
        if y in MyObject.param_values:
          return MyObject.param_values[y] + z
        else:
          return self.total(y) + z
developer_hatch
  • 15,898
  • 3
  • 42
  • 75