I have multiple classes with methods compute
that take some (possibly very different) arguments, perform computations and return the result. I wanted to create a base class that would allow caching the results and retrieving them from cache.
Simple example:
import hashlib
class Base():
def compute(self, *args):
raise NotImplementedError
def compute_with_cache(self, *args):
hash_ = self._hash_anything(args)
try:
return self.cache[hash_]
except KeyError:
result = self.compute(*args)
self.cache[hash_] = result
return result
@staticmethod
def _hash_anything(args):
return hashlib.md5(str(args).encode('utf-8')).hexdigest()
class SpecificComputation(Base):
cache = {}
def compute(self, some_arg, other_arg):
result = ... # performs possibly time consuming operations
return result
This solution is somehow working, but has two (maybe more?) problems:
- Assumes that
str(args)
is different for every different set ofargs
, this is usually true for simple data structures like lists or numbers, but will fail e.g. for instances of classes without unequivocal string representations - Requires casting to string and computing md5, and that takes time
First problem could be solved using pickle
, but it will slow down even more.
So, the question is: how to implement _hash_anything
in a way that will:
be immune to problem described in 1
will be as fast as possible
?
Is there any general solution assuming args
could contain virtually anything?
If not, maybe there is an efficient solution assuming args
contain only simple data structures (tuples of integers, lists of strings etc) or np.arrays?