3

TLDR

Is there a way to mark cached values so I could do something like:

cache.filter('some_tag').clear()

Details

In my project I have the following model:

class Item(models.Model):
    month = models.DateField('month', null=False, blank=False, db_index=True)
    kg = models.BigIntegerField('kg')
    tags = models.ManyToManyField('Tag', related_name='items')
    // bunch of other fields used to filter data

And I have a report_view that returns the sum of kg by month and by tag according to the filters supplied in the URL query.

Something like this:

--------------------------------
|Tag    |jan    |fev    |mar   |
--------------------------------
|Tag 1  |1000   |1500   |2000  |
--------------------------------
|Tag 2  |1235   |4652   |0     |
--------------------------------

As my Item table has already more than 4 million records and is always growing my report_view is cached.

So far I got all of this covered.

The problem is: the site user can change the tags from the Items and every time this occurs I have to invalidate the cache, but I would like to do it in a more granular way.

For example if a user changes a tag in a Item from january that should invalidate all the totals for that month (I prefer to cache by month because sometimes changing one tag has a cascading effect on others). However I don't know all the views that have been cached as there are thousands of possibilities of different filters that change the URL.

What I have done so far:

  • Set a signal to invalidate all my caches when a tag changes
@receiver(m2m_changed, sender=Item.tags.through)
def tags_changed(sender, **kwargs):
    cache.clear()

But this cleans everything which is not optimal in my case. Is there a way of doing something like cache.filter('some_tag').clear() with Django cache framework?

Danilo Favato
  • 283
  • 1
  • 11

1 Answers1

3

https://martinfowler.com/bliki/TwoHardThings.html

There are only two hard things in Computer Science: cache invalidation and naming things.

-- Phil Karlton

Presuming you are using Django's Cache Middleware, you'll need to target the cache keys that are relevant. You can see how they generate the cache key from these two files in the Django Project:

- https://github.com/django/django/blob/master/django/middleware/cache.py#L99
- https://github.com/django/django/blob/master/django/utils/cache.py#L367
- https://github.com/django/django/blob/master/django/utils/cache.py#L324

_generate_cache_key

def _generate_cache_key(request, method, headerlist, key_prefix):
    """Return a cache key from the headers given in the header list."""
        ctx = hashlib.md5()
        for header in headerlist:
            value = request.META.get(header)
            if value is not None:
                ctx.update(force_bytes(value))
        url = hashlib.md5(force_bytes(iri_to_uri(request.build_absolute_uri())))
        cache_key = 'views.decorators.cache.cache_page.%s.%s.%s.%s' % (key_prefix, method, url.hexdigest(), ctx.hexdigest())
        return _i18n_cache_key_suffix(request, cache_key)

The cache key is generated based on attributes and headers from the request and hashed values (i.e. the url is hashed and used as part of the key). The Vary header in your response specifies other headers to use as part of the cache it.

If you understand how Django is caching your views and calculating your cache keys, then you can use this to target appropriate cache entries, but this is still very difficult because the url is hashed you can't target url patterns (you could use https://stackoverflow.com/a/35629796/784648 cache.delete_patterns(...) otherwise).

Django primarily relies on timeout to invalidate the cache.

I would recommend looking into Django Cacheops, this package is designed to work with Django's ORM to cache and invalidate QuerySets. This seems a lot more practical for your needs because you want fine-grained invalidation on your Item QuerySets, you simply will not get that from Django's Cache Middleware. Take a look at the github repo, I've used it and it works well if you take the time to read the docs and understand it.

A. J. Parr
  • 7,731
  • 2
  • 31
  • 46
  • Thanks for your response. Even if know how the `cache_keys` are generated for the `URL` I wouldn't be able to generate every `cache_key` possibility. There are many `filter` fields with thousands of different possible values, I would have to loop over all the possibilities to check if the cache exists and than delete it... The `cache.delete_patterns()` seems like a better option but it needs an external package.. I don't know if it's worth installing it just for one feature... I'll check how they implemented this cache logic... – Danilo Favato Jul 27 '17 at 16:09