I'm using a memoized
decorator to cache repeated calls. I'm seeing about a 4x execution speedup due to memoization on some moderate-sized test cases.
With larger test cases, the memoized dictionary mapping inputs to outputs takes up a considerable amount of memory, to the point where I'm getting "java.lang.OutOfMemoryError: Java heap space"
errors (I'm using Jython).
I can save some memory by using memoized_cache[hash(key)] = value
rather than memoized_cache[key]: value
, assuming that hash(key)
is fewer bytes than key
. As pointed out by @gnibbler, this will cause problems if there are hash collisions.
The other memory savings I can introduce is limiting the size of the dictionary to a fixed number of items. Such a SizedDict recipe already exists, but I'd like to truncate elements that are accessed the least often.
Here's what I have written:
from collections import Counter
class FrequencySizedDict(dict):
def __init__(self, size=1000):
dict.__init__(self)
self._maxsize = size
self._counter = Counter()
def __getitem__(self, key):
self._counter[key] += 1
return dict.__getitem__(self, key)
def resize(self, size):
keys = list(self._counter.most_common(size))
items = [(key, dict.__getitem__(self, key)) for key in keys]
self.clear()
dict.update(self, items)
def __setitem__(self, key, value):
if len(self._queue) >= self._maxsize:
self.resize(self._maxsize/2)
self._counter[key] += 1
dict.__setitem__(self, key, value)
Is there a better data way to implement this with less memory or time overhead? resize
is quite expensive: O(n log n)