2

Consider

def f(x,*args):
    intermediate = computationally_expensive_fct(x)
    return do_stuff(intermediate,*args)

The problem: For the same x, this function might be called thousands of times with different arguments (other than x) and each time the function gets called intermediate would be computed (Cholesky factorisation, cost O(n^3)). In principle however it is enough if for each x, intermediate is computed only once for each x and then that result would be used again and again by f with different args.

My idea To remedy this, I tried to create a global dictionary where the function looks up whether for its parameter x the expensive stuff has already been done and stored in the dictionary or whether it has to compute it:

if all_intermediates not in globals():
    global all_intermediates = {}

if all_intermediates.has_key(x):
    pass
else:
    global all_intermediates[x] = computationally_expensive_fct(x)

It turns out I can't do this because globals() is a dict itself and you can't hash dicts in python. I'm a novice programmer and would be happy if someone could point me towards a pythonic way to do what I want to achieve.

Ben
  • 465
  • 2
  • 6
  • 17
  • 1
    This is usually called `@memoize`. https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize I've not figured out why the answers are reinventing the wheel. I don't see the need to muck about with globals (which is unkind to readers of the code). – msw Dec 19 '15 at 14:35
  • @msw I was reinventing the wheel as I didn't know about `@memoize`. And thanks for pointing to this. May be you should add your comment as answer. – taskinoor Dec 19 '15 at 14:50

3 Answers3

2

Solution

A bit more lightweight than writing a decorator and without accessing globals:

def f(x, *args):
    if not hasattr(f, 'all_intermediates'):
        f.all_intermediates = {}
    if x not in f.all_intermediates:
        f.all_intermediates[x] = computationally_expensive_fct(x)
    intermediate = f.all_intermediates[x]
    return do_stuff(intermediate,*args)

Variation

A variation that avoids the if not hasattr but needs to set all_intermediates as attribute of f after it is defined:

def f(x, *args):
    if x not in f.all_intermediates:
        f.all_intermediates[x] = computationally_expensive_fct(x)
    intermediate = f.all_intermediates[x]
    return do_stuff(intermediate,*args)
f.all_intermediates = {}

This caches all_intermediates as an attribute of the function itself.

Explanation

Functions are objects and can have attributes. Therefore, you can store the dictionary all_intermediates as an attribute of the function f. This makes the function self contained, meaning you can move it to another module without worrying about module globals. Using the variation shown above, you need move f.all_intermediates = {} along with function.

Putting things into globals() feels not right. I recommend against doing this.

Mike Müller
  • 82,630
  • 20
  • 166
  • 161
1

I don't get it why you are trying to use globals(). Instead of using globals() you can simply keep computed values in your own module level dictionary and have a wrapper function that will look up whether intermediate is already calculated or not. Something like this:

computed_intermediate = {}

def get_intermediate(x):
    if x not in computed_intermediate:
        computed_intermediate[x] = computationally_expensive_fct(x)

    return computed_intermediate[x]

def f(x,*args):
    intermediate = get_intermediate(x)
    return do_stuff(intermediate,*args)

In this way computationally_expensive_fct(x) will be calculated only once for each x, namely the first time it is accessed.

taskinoor
  • 45,586
  • 12
  • 116
  • 142
1

This is often implemented with the @memoized decorator on the expensive function.

It is described at https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize and brief enough to duplicate here in case of link rot:

import collections
import functools

class memoized(object):
   '''Decorator. Caches a function's return value each time it is called.
   If called later with the same arguments, the cached value is returned
   (not reevaluated).
   '''
   def __init__(self, func):
      self.func = func
      self.cache = {}
   def __call__(self, *args):
      if not isinstance(args, collections.Hashable):
         # uncacheable. a list, for instance.
         # better to not cache than blow up.
         return self.func(*args)
      if args in self.cache:
         return self.cache[args]
      else:
         value = self.func(*args)
         self.cache[args] = value
         return value
   def __repr__(self):
      '''Return the function's docstring.'''
      return self.func.__doc__
   def __get__(self, obj, objtype):
      '''Support instance methods.'''
      return functools.partial(self.__call__, obj)

Once the expensive function is memoized, using it is invisible:

@memoized
def expensive_function(n):
   # expensive stuff
   return something

p = expensive_function(n)
q = expensive_function(n)
assert p is q

Do note that if the result of expensive_function is not hashable (lists are a common example) there will not be a performance gain, it will still work, but act as if it isn't memoized.

msw
  • 42,753
  • 9
  • 87
  • 112
  • Answers that use the `in` operator to see if the answer has already been computed have to take the hashable property into account to work properly. I mention this because the factorization in the question will likely return a matrix; if the matrix is implemented as a list, it is not memoizable. You can also implement matrices as tuples (immutable lists, mostly) so you might be able to work around this limitation. – msw Dec 19 '15 at 15:18
  • The factorisation will return a big numpy array. Will your solution help under this constraint? – Ben Dec 19 '15 at 16:24