21

I'm using a LRU cache to speed up some rather heavy duty processing. It works well and speeds things up considerably. However...

When I multiprocess, each process creates it's own separate cache and there are 8 copies of the same thing. That doesn't appear to be a problem, until the box runs out of memory and bad things happen as a result...

Ideally I only need a cachesize of around 300 items for the application, and 1*300 will fit in the 7GB i have to work with, but the 8*300 just doesn't fit.

How do I get all the processes to share the same cache?

John Mee
  • 50,179
  • 34
  • 152
  • 186
  • The Python `multiprocessing` module has a section on [sharing state between processes](http://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes). – Sam Mussmann Dec 03 '12 at 23:58
  • is an out-of-process cache an option for you? Could you use pickle on your cached state and store it in, say, redis? – SingleNegationElimination Dec 04 '12 at 00:00
  • @sam Indeed, it appears the `Manager` can share a `dict`, which i suspect is the lfu cache is internally. I guess I'm hoping someone has hacked that up before to suit this problem so I don't have to ;-) – John Mee Dec 04 '12 at 00:03
  • @TokenMacGuy Like your thinking, but it's high-intensity (thousands per msec) so the mention of pickling has me immediately prejudging it too slow. – John Mee Dec 04 '12 at 00:08
  • Is there a way to do it with different processes started by gunicorn + bottle? – Alberto Bonsanto Sep 03 '19 at 21:33
  • @AlbertoBonsanto Interesting question, certainly worth searching for/opening up a new question for that. I assume you mean for sharing objects between requests. – John Mee Sep 03 '19 at 23:49

2 Answers2

13

I believe you can use a Manager to share a dict between processes. That should in theory let you use the same cache for all functions.

However, I think a saner logic would be to have one process that responds to queries by looking them up in the cache, and if they are not present then delegating the work to a subprocess, and caching the result before returning it. You could easily do that with

with concurrent.futures.ProcessPoolExecutor() as e:
    @functools.lru_cache
    def work(*args, **kwargs):
        return e.submit(slow_work, *args, **kwargs)

Note that work will return Future objects, which the consumer will have to wait on. The lru_cache will cache the future objects so they will be returned automatically; I believe you can access their data more than once but can't test it right now.

If you're not using Python 3, you'll have to install backported versions of concurrent.futures and functools.lru_cache.

Katriel
  • 120,462
  • 19
  • 136
  • 170
0

Pass the shared cache to each process. The parent process can instantiate a single cache and refer it to each process as an argument...

@utils.lru_cache(maxsize=300)
def get_stuff(key):
    """This is the routine that does the stuff which can be cached.
    """
    return Stuff(key)

def process(stuff_obj):
    """This is the routine which multiple processes call to do work with that Stuff
    """
    # get_stuff(key) <-- Wrong; I was calling the cache from here
    stuff_obj.execute()

def iterate_stuff(keys):
    """This generates work for the processses.
    """
    for key in keys:
        yield get_stuff(key)  # <-- I can call the cache from the parent

def main():
    ...
    keys = get_list_of_keys()
    for result in pool.imap(process, iterate_stuff(keys)):
         evaluate(result)
    ...

This example is simple because I can look up the cache before calling the process. Some scenarios might prefer to pass a pointer to the cache rather than the value. eg:

        yield (key, get_stuff)

Katriel's put me on the right track and I would implement that answer, but, silly me, my mistake was even simpler to solve than what he suggested.

John Mee
  • 50,179
  • 34
  • 152
  • 186
  • 1
    This will improve performance only if `get_stuff(key)` returns small objects, passing a large object to a child process will take a long time. – betontalpfa Feb 19 '21 at 14:49