1

I am calling a function that starts a process that takes longer to execute, many different things are done. This function chiefly handles instances of a particular class, Item. These items are categorized by different attributes: category1, category2 and category3.

Now, there is a different model that applies some sort of rules to these categories: Rule with many-to-many attributes: categories1, categories2 and categories3. A rule applies to an Item, if the same rule points to different categories, only one of them should be applied. The decision of which one is defined by a certain logic encapsulated in a function:

class Rule(models.Model):
    warehouse = models.ForeignKey('Warehouse')
    categories1 = models.ManyToManyField('Category1')
    categories2 = models.ManyToManyField('Category2')
    categories3 = models.ManyToManyField('Category3')

    @staticmethod
    def get_rules_that_applies(item):
        rules = warehouse.rule_set.all()
        if not rules.exists():
            return None
        # ... determine which rule applies to the item by filtering, etc.
        return rule

The issue lies in the get_rules_that_applies method. Every time we need to get the rule that applies to a certain item and let me say again that many many items are involved in the process we are talking about, warehouse.rule_set.all() is called.

Since the rules will not change during this process, we can just cache all the rules in the ware house, but how? How can I make sure warehouse = warehouse.rule_set.all() is cached and all filtering and QuerySet operations that act on these rules are not hitting the database?

John Moutafis
  • 22,254
  • 11
  • 68
  • 112
dabadaba
  • 9,064
  • 21
  • 85
  • 155
  • What is warehouse in get_rules_that_applies? – Dima Kudosh Aug 10 '17 at 09:29
  • @DimaKudosh yeah I should have mentioned it: the `warehouse` is like the "master/context" object where all the process happens. Everything in the app revolves around a single `warehouse` instance. – dabadaba Aug 10 '17 at 09:59

2 Answers2

1

I believe that the solution you are seeking is the memoization of the get_rules_that_applies method.

There is a tool ready-made for that, called django-memoize and those are its docs.

Quick-start on usage:

  1. pip install django-memoize
  2. Place it on your INSTALLED_APPS

    INSTALLED_APPS = [
        '...',
        'memoize',
    ]
    
  3. In your model.py:

    from memoize import memoize
    
    class Rule(models.Model):
        warehouse = models.ForeignKey('Warehouse')
        categories1 = models.ManyToManyField('Category1')
        categories2 = models.ManyToManyField('Category2')
        categories3 = models.ManyToManyField('Category3')
    
        @staticmethod
        @memoize(timeout=something_reasonable_in_seconds)
        def get_rules_that_applies(item):
            rules = warehouse.rule_set.all()
            if not rules.exists():
                return None
               # ... determine which rule applies to the item by filtering, etc.
            return rules
    

(Update) A Semi-DIY Approach:

Since my answer, I read the following post: https://www.peterbe.com/plog/cache_memoize-cache-decorator-for-django which is accompanied by a gist on how to achieve memoization yourself.


A More DIY Approach:

Python 3.2 and up:

The @functools.lru_cache decorator which is a:

Decorator to wrap a function with a memoizing callable that saves up to the maxsize most recent calls. It can save time when an expensive or I/O bound function is periodically called with the same arguments.

How to use it:

from functools import lru_cache


class Rule(models.Model):
    ...

    @lru_cache(maxsize=a_reasonable_integer_size_of_cache)
    def get_rules_that_applies(item):
        rules = warehouse.rule_set.all()
        if not rules.exists():
            return None
            # ... determine which rule applies to the item by filtering, etc.
        return rules

maxsize: Defines the size of the cache in function calls to be stored. It can be set to None to cache every call.

Python < 3.2

In here What is memoization and how can I use it in Python? exist a more "old school" approach.


How to cache a queryset with either of the above methods:

Why not define an intermediate function to form the queryset and cache that functions results?

@lru_cache(maxsize=None)

or 

@memoize()
def middle_function():
    return warehouse.rule_set.all()

and then in your get_rules_that_applies function:

def get_rules_that_applies(item):
    rules = middle_function()
John Moutafis
  • 22,254
  • 11
  • 68
  • 112
  • I'm not really after a cache with a timeout, it can and should be cashed during the entire lifecycle of the execution of the process (you can easily tell when it finishes when the `warehouse` instance is freed up from memory). Also, that would cache the rule for the item right? (since it's caching the function results) Can't I just force-cache a queryset? – dabadaba Aug 10 '17 at 10:24
  • @dabadaba I made an edit on how to cache a queryset as I imagine it :) Btw, for `memoize` I don't know, but the `lru_cache` doesn't seem to have a timeout. – John Moutafis Aug 10 '17 at 10:37
  • I just realized I can't use `django-memoize` because my Django version is older. – dabadaba Aug 10 '17 at 10:57
  • @dabadaba You can still use the `@lru_cache` though or create your own as shown here: https://stackoverflow.com/questions/1988804/what-is-memoization-and-how-can-i-use-it-in-python?answertab=votes#tab-top – John Moutafis Aug 10 '17 at 10:58
  • using Python 2.7 so I guess I can't – dabadaba Aug 10 '17 at 11:00
  • @dabadaba If you cannot upgrade to Python 3.2+ or A greater version of Django, you can create your own `memoize` class as show in the linked so answer (I have edited my answer to include that as well). Good luck! – John Moutafis Aug 10 '17 at 11:29
0

You have 2 option:

  1. cache the item in the view
  2. cahce the item in the model

The code will be the same in view and in the model, Import cahce:

from django.core.cache import cache

Code:

if cache.get('query_result') is not None:
    return cache.get('query_result')
else:
    cache.set('query_result', result, 3600)
    #cache.set('cache_name', 'your query', 'expiry time')
    return rule

Your model will be:

class Rule(models.Model):
warehouse = models.ForeignKey('Warehouse')
categories1 = models.ManyToManyField('Category1')
categories2 = models.ManyToManyField('Category2')
categories3 = models.ManyToManyField('Category3')

@staticmethod
def get_rules_that_applies(item):
    rules = warehouse.rule_set.all()
    if not rules.exists():
        return None
    # ... determine which rule applies to the item by filtering, etc.
    if cache.get('query_result') is not None:
        return cache.get('query_result')
    else:
        cache.set('query_result', result, 3600)
        #cache.set('cache_name', 'your query', 'expiry time')
        return rule

    return rule

Few info about Django query, When they are evaluated?:

https://docs.djangoproject.com/en/1.11/ref/models/querysets/#when-querysets-are-evaluated

Hope this help

Mattia
  • 961
  • 13
  • 25