Balance between dictionary size and memory consumption

Question

I'm using a memoized decorator to cache repeated calls. I'm seeing about a 4x execution speedup due to memoization on some moderate-sized test cases.

With larger test cases, the memoized dictionary mapping inputs to outputs takes up a considerable amount of memory, to the point where I'm getting "java.lang.OutOfMemoryError: Java heap space" errors (I'm using Jython).

I can save some memory by using memoized_cache[hash(key)] = value rather than memoized_cache[key]: value, assuming that hash(key) is fewer bytes than key. As pointed out by @gnibbler, this will cause problems if there are hash collisions.

The other memory savings I can introduce is limiting the size of the dictionary to a fixed number of items. Such a SizedDict recipe already exists, but I'd like to truncate elements that are accessed the least often.

Here's what I have written:

from collections import Counter

class FrequencySizedDict(dict):
    def __init__(self, size=1000):
        dict.__init__(self)
        self._maxsize = size
        self._counter = Counter()
    def __getitem__(self, key):
        self._counter[key] += 1
        return dict.__getitem__(self, key)
    def resize(self, size):
        keys = list(self._counter.most_common(size))
        items = [(key, dict.__getitem__(self, key)) for key in keys]
        self.clear()
        dict.update(self, items)
    def __setitem__(self, key, value):
        if len(self._queue) >= self._maxsize:
            self.resize(self._maxsize/2)
        self._counter[key] += 1
        dict.__setitem__(self, key, value)

Is there a better data way to implement this with less memory or time overhead? resize is quite expensive: O(n log n)

`memoized_cache[hash(key)]` will cause interesting bugs when you have a hash collison. — John La Rooy, Apr 14 '14 at 23:54

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

2

Use functools.lru_cache

@functools.lru_cache(maxsize=128, typed=False)

Decorator to wrap a function with a memoizing callable that saves up to the maxsize most recent calls. It can save time when an expensive or I/O bound function is periodically called with the same arguments.

Or pylru as described in the answer memoization library for python 2.7

edited Jun 20 '20 at 09:12

Community

1
1

answered Apr 14 '14 at 23:55

rmmh

6,997
26
37

This is the best answer for Python3.2+ Unfortunately Jython isn't there yet. – John La Rooy Apr 14 '14 at 23:56
Woops, didn't realize it was a Py3 only feature. added a link to a library that should work with Jython – rmmh Apr 15 '14 at 00:00
I'm really looking for a Least Frequently Used cache algorithm, but this answer put me on the right track of what to look for. This LFU implementation looks somewhat similar to my original code: http://code.activestate.com/recipes/498245-lru-and-lfu-cache-decorators/, using a Counter to keep track of how frequently an item is accessed. – IceArdor Apr 15 '14 at 00:53

Balance between dictionary size and memory consumption

1 Answers1