0

I have multiple classes with methods compute that take some (possibly very different) arguments, perform computations and return the result. I wanted to create a base class that would allow caching the results and retrieving them from cache.

Simple example:

import hashlib

class Base():
    def compute(self, *args):
        raise NotImplementedError
    
    def compute_with_cache(self, *args):
        hash_ = self._hash_anything(args)
        try:
            return self.cache[hash_]
        except KeyError:
            result = self.compute(*args)
            self.cache[hash_] = result
            return result
    
    @staticmethod
    def _hash_anything(args):
        return hashlib.md5(str(args).encode('utf-8')).hexdigest()

class SpecificComputation(Base):
    cache = {}
    def compute(self, some_arg, other_arg):
        result = ... # performs possibly time consuming operations
        return result

This solution is somehow working, but has two (maybe more?) problems:

  1. Assumes that str(args) is different for every different set of args, this is usually true for simple data structures like lists or numbers, but will fail e.g. for instances of classes without unequivocal string representations
  2. Requires casting to string and computing md5, and that takes time

First problem could be solved using pickle, but it will slow down even more.

So, the question is: how to implement _hash_anything in a way that will:

  • be immune to problem described in 1

  • will be as fast as possible

    ?

Is there any general solution assuming args could contain virtually anything?

If not, maybe there is an efficient solution assuming args contain only simple data structures (tuples of integers, lists of strings etc) or np.arrays?

Community
  • 1
  • 1
jbet
  • 452
  • 4
  • 12
  • Is this premature optimization? Perhaps get it working before worrying about performance... – dawg Mar 07 '20 at 17:21
  • 1
    How about using the `memopy` module? – Barmar Mar 07 '20 at 17:21
  • @Barmar this looks great, but `pip3 install memopy` says `ERROR: No matching distribution found for memopy`. With `-v` it finds `https://pypi.org/simple/memopy/` and stops somewhere around `Skipping link: none of the wheel's tags match: py2-none-any`. Does this work on python3? Or maybe i have to install it manually? – jbet Mar 07 '20 at 17:37
  • @jbet - looks like you could just copy the module and add it to your *project's* folder/directory. – wwii Mar 07 '20 at 17:56
  • Related: [Python - anyone have a memoizing decorator that can handle unhashable arguments?](https://stackoverflow.com/questions/4669391/python-anyone-have-a-memoizing-decorator-that-can-handle-unhashable-arguments) ... https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize – wwii Mar 07 '20 at 17:59
  • Unless I'm missing something critical, what about `functools.lru_cache`? – Paul M. Mar 07 '20 at 18:05
  • @user10987432 yes, unfortunately you're missing `Since a dictionary is used to cache results, the positional and keyword arguments to the function must be hashable.` in lru_cache docs : ) – jbet Mar 07 '20 at 19:24

0 Answers0