Ensure queryset is cached

Question

I am calling a function that starts a process that takes longer to execute, many different things are done. This function chiefly handles instances of a particular class, Item. These items are categorized by different attributes: category1, category2 and category3.

Now, there is a different model that applies some sort of rules to these categories: Rule with many-to-many attributes: categories1, categories2 and categories3. A rule applies to an Item, if the same rule points to different categories, only one of them should be applied. The decision of which one is defined by a certain logic encapsulated in a function:

class Rule(models.Model):
    warehouse = models.ForeignKey('Warehouse')
    categories1 = models.ManyToManyField('Category1')
    categories2 = models.ManyToManyField('Category2')
    categories3 = models.ManyToManyField('Category3')

    @staticmethod
    def get_rules_that_applies(item):
        rules = warehouse.rule_set.all()
        if not rules.exists():
            return None
        # ... determine which rule applies to the item by filtering, etc.
        return rule

The issue lies in the get_rules_that_applies method. Every time we need to get the rule that applies to a certain item and let me say again that many many items are involved in the process we are talking about, warehouse.rule_set.all() is called.

Since the rules will not change during this process, we can just cache all the rules in the ware house, but how? How can I make sure warehouse = warehouse.rule_set.all() is cached and all filtering and QuerySet operations that act on these rules are not hitting the database?

@DimaKudosh yeah I should have mentioned it: the `warehouse` is like the "master/context" object where all the process happens. Everything in the app revolves around a single `warehouse` instance. — dabadaba, Aug 10 '17 at 09:59

John Moutafis · Accepted Answer · 2017-11-24T12:01:55.340

I believe that the solution you are seeking is the memoization of the get_rules_that_applies method.

There is a tool ready-made for that, called django-memoize and those are its docs.

Quick-start on usage:

pip install django-memoize

Place it on your INSTALLED_APPS

INSTALLED_APPS = [
    '...',
    'memoize',
]

In your model.py:

from memoize import memoize

class Rule(models.Model):
    warehouse = models.ForeignKey('Warehouse')
    categories1 = models.ManyToManyField('Category1')
    categories2 = models.ManyToManyField('Category2')
    categories3 = models.ManyToManyField('Category3')

    @staticmethod
    @memoize(timeout=something_reasonable_in_seconds)
    def get_rules_that_applies(item):
        rules = warehouse.rule_set.all()
        if not rules.exists():
            return None
           # ... determine which rule applies to the item by filtering, etc.
        return rules

(Update) A Semi-DIY Approach:

Since my answer, I read the following post: https://www.peterbe.com/plog/cache_memoize-cache-decorator-for-django which is accompanied by a gist on how to achieve memoization yourself.

A More DIY Approach:

Python 3.2 and up:

The @functools.lru_cache decorator which is a:

Decorator to wrap a function with a memoizing callable that saves up to the maxsize most recent calls. It can save time when an expensive or I/O bound function is periodically called with the same arguments.

How to use it:

from functools import lru_cache


class Rule(models.Model):
    ...

    @lru_cache(maxsize=a_reasonable_integer_size_of_cache)
    def get_rules_that_applies(item):
        rules = warehouse.rule_set.all()
        if not rules.exists():
            return None
            # ... determine which rule applies to the item by filtering, etc.
        return rules

maxsize: Defines the size of the cache in function calls to be stored. It can be set to None to cache every call.

Python < 3.2

In here What is memoization and how can I use it in Python? exist a more "old school" approach.

How to cache a queryset with either of the above methods:

Why not define an intermediate function to form the queryset and cache that functions results?

@lru_cache(maxsize=None)

or 

@memoize()
def middle_function():
    return warehouse.rule_set.all()

and then in your get_rules_that_applies function:

def get_rules_that_applies(item):
    rules = middle_function()

I'm not really after a cache with a timeout, it can and should be cashed during the entire lifecycle of the execution of the process (you can easily tell when it finishes when the `warehouse` instance is freed up from memory). Also, that would cache the rule for the item right? (since it's caching the function results) Can't I just force-cache a queryset? — dabadaba, Aug 10 '17 at 10:24
@dabadaba I made an edit on how to cache a queryset as I imagine it :) Btw, for `memoize` I don't know, but the `lru_cache` doesn't seem to have a timeout. — John Moutafis, Aug 10 '17 at 10:37
I just realized I can't use `django-memoize` because my Django version is older. — dabadaba, Aug 10 '17 at 10:57
@dabadaba You can still use the `@lru_cache` though or create your own as shown here: https://stackoverflow.com/questions/1988804/what-is-memoization-and-how-can-i-use-it-in-python?answertab=votes#tab-top — John Moutafis, Aug 10 '17 at 10:58
@dabadaba If you cannot upgrade to Python 3.2+ or A greater version of Django, you can create your own `memoize` class as show in the linked so answer (I have edited my answer to include that as well). Good luck! — John Moutafis, Aug 10 '17 at 11:29

score 0 · Answer 2 · answered Aug 10 '17 at 09:54

You have 2 option:

cache the item in the view
cahce the item in the model

The code will be the same in view and in the model, Import cahce:

from django.core.cache import cache

Code:

if cache.get('query_result') is not None:
    return cache.get('query_result')
else:
    cache.set('query_result', result, 3600)
    #cache.set('cache_name', 'your query', 'expiry time')
    return rule

Your model will be:

class Rule(models.Model):
warehouse = models.ForeignKey('Warehouse')
categories1 = models.ManyToManyField('Category1')
categories2 = models.ManyToManyField('Category2')
categories3 = models.ManyToManyField('Category3')

@staticmethod
def get_rules_that_applies(item):
    rules = warehouse.rule_set.all()
    if not rules.exists():
        return None
    # ... determine which rule applies to the item by filtering, etc.
    if cache.get('query_result') is not None:
        return cache.get('query_result')
    else:
        cache.set('query_result', result, 3600)
        #cache.set('cache_name', 'your query', 'expiry time')
        return rule

    return rule

Few info about Django query, When they are evaluated?:

https://docs.djangoproject.com/en/1.11/ref/models/querysets/#when-querysets-are-evaluated

Hope this help

this would be caching the rule that applies to the item, not the source queryset of rules — dabadaba, Aug 10 '17 at 10:21

Ensure queryset is cached

2 Answers2