Is django aggregate SUM cached? Or does it calculate every time?

Question

Let's say I have a Transaction model:

class Transaction(models.Model):
    receiver = models.CharField(unique=True)
    value = models.DecimalField(max_digits=10, decimal_places=5)
    date = models.DateField()

Now, I have tens of thousands of transaction that go into the table.

I need to show the receiver the transactions for each day, week, month, year, etc.

I can do this by using the following statement:

from django.db.models import Sum
transactions = Transaction.objects.filter(receiver="name").aggregate(Sum('value'))

and then filter depending on what date periods I need. Example:

transactions.filter(date__gte=start_date, date__lte=end_date)

How fast is this? What happens if the table grows to have millions of entries? How do I make sure that it is optimized? Does django cache the values from Sum?

My naive way of thinking how to make this optimized was to create more models: DayTransaction, MonthTransaction, YearTransaction, etc. and when I update the Transaction model, I update all the others models.

This way, when the users request the data, I get it from the models that "cached" the summed data and it doesn't have to do any operations, it just retrieves data from the respective tables, which are a lot smaller than the Transaction one.

The problem with this approach is that it might not actually be faster, it's not that flexible and the data could get messed up if anything goes wrong when updating all the models.

The database should be able to handle millions of entries. Check out https://stackoverflow.com/questions/2051481/django-table-with-million-of-rows — Matt Salzman, Oct 16 '18 at 12:40

Julian · Accepted Answer · 2018-10-17T02:09:56.217

So to answer your question, yes and no. Querysets (Transaction.objects.filter(...)) DO get cached when they are evaluated, but generally this is not a problem.

What you are worried about is the actual aggregation query. So first, let me answer your general questions, then I will explain why...

How fast is this?
- Very fast
What happens if the table grows to have millions of entries?
- Nothing, it will be fine so long as you have the proper columns indexed.
How do I make sure that it is optimized?
- You don't have to, Django will do that for you (for the most part).
Does Django cache the values from Sum?
- Yes and no. It caches evaluated querysets, but not the queries themselves.

The big picture is what we need to explain here. Django querysets translate down into SQL. This SQL is quite literally built to make the queries you want to make. The SQL for Transaction.objects.filter(...).aggregate(...) would look something like

SELECT some_aggregate_func(*)
FROM myapp_transaction
WHERE [insert filters here];

Your ideas about breaking the data up into multiple tables is actually a really bad idea. It will end up slowing down writes and will complicate your queries greatly. It will also end up creating a nightmare of duplicate data for you to manage. See the link that @matt-salzman provided about databases handling a lot of data.

Is django aggregate SUM cached? Or does it calculate every time?

1 Answers1