I've run into an interesting situation in a new app I've added to an existing project. My goal is to (using a Celery task) update many rows at once with a value that includes annotated aggregated values from foreign keyed objects. Here are some example models that I've used in previous questions:
class Book(models.model):
author = models.CharField()
num_pages = models.IntegerField()
num_chapters = models.IntegerField()
class UserBookRead(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL)
user_book_stats = models.ForeignKey(UserBookStats)
book = models.ForeignKey(Book)
complete = models.BooleanField(default=False)
pages_read = models.IntegerField()
class UserBookStats(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL)
total_pages_read = models.IntegerField()
I'm attempting to:
- Use the
post_save
signal fromBook
instances to updatepages_read
on relatedUserBookRead
objects when aBook
page count is updated. - At the end of the signal, launch a background Celery task to roll up the
pages_read
from eachUserBookRead
which was updated, and update thetotal_pages_read
on each relatedUserBookStats
(This is where the problem occurs)
I'm trying to be as lean as possible as far as number of queries- step 1 is complete and only requires a few queries for my actual use case, which seems acceptable for a signal handler, as long as those queries are optimized properly.
Step 2 is more involved, hence the delegation to a background task. I've managed to accomplish most of it in a fairly clean manner (well, for me at least).
The problem I run into is that when annotating the UserBookStats
queryset with a total_pages
aggregation (the Sum()
of all pages_read
for related UserBookRead
objects), I can't follow that with a straight update
of the queryset to set the total_pages_read
field.
Here's the code (the Book
instance is passed to the task as book
):
# use the provided book instance to get the stats which need to be updated
book_read_objects= UserBookRead.objects.filter(book=book)
book_stat_objects = UserBookStats.objects.filter(id__in=book_read_objects.values_list('user_book_stats__id', flat=True).distinct())
# annotate top level stats objects with summed page count
book_stat_objects = book_stat_objects.annotate(total_pages=Sum(F('user_book_read__pages_read')))
# update the objects with that sum
book_stat_objects.update(total_pages_read=F('total_pages'))
On executing the last line, this error is thrown:
django.core.exceptions.FieldError: Aggregate functions are not allowed in this query
After some research, I found an existing Django ticket for this use case here, on which the last comment mentions 2 new features in 1.11 that could make it possible.
Is there any known/accepted way to accomplish this use case, perhaps using Subquery
or OuterRef
? I haven't had any success trying to fold in the aggregation as a Subquery
. The fallback here is:
for obj in book_stat_objects:
obj.total_pages_read = obj.total_pages
obj.save()
But with potentially tens of thousands of records in book_stat_objects
, I'm really trying to avoid issuing an UPDATE for each one individually.