Let's say I have a Transaction model:
class Transaction(models.Model):
receiver = models.CharField(unique=True)
value = models.DecimalField(max_digits=10, decimal_places=5)
date = models.DateField()
Now, I have tens of thousands of transaction that go into the table.
I need to show the receiver the transactions for each day, week, month, year, etc.
I can do this by using the following statement:
from django.db.models import Sum
transactions = Transaction.objects.filter(receiver="name").aggregate(Sum('value'))
and then filter depending on what date periods I need. Example:
transactions.filter(date__gte=start_date, date__lte=end_date)
How fast is this? What happens if the table grows to have millions of entries? How do I make sure that it is optimized? Does django cache the values from Sum?
My naive way of thinking how to make this optimized was to create more models: DayTransaction, MonthTransaction, YearTransaction, etc. and when I update the Transaction model, I update all the others models.
This way, when the users request the data, I get it from the models that "cached" the summed data and it doesn't have to do any operations, it just retrieves data from the respective tables, which are a lot smaller than the Transaction one.
The problem with this approach is that it might not actually be faster, it's not that flexible and the data could get messed up if anything goes wrong when updating all the models.