How to get a Count based on a subquery?

Question

I am suffering to get a query working despite all I have been trying based on my web search, and I think I need some help before becoming crazy.

I have four models:

class Series(models.Model):
    puzzles = models.ManyToManyField(Puzzle, through='SeriesElement', related_name='series')
    ...

class Puzzle(models.Model):
    puzzles = models.ManyToManyField(Puzzle, through='SeriesElement', related_name='series')
    ...

class SeriesElement(models.Model):
    puzzle = models.ForeignKey(Puzzle,on_delete=models.CASCADE,verbose_name='Puzzle',)
    series = models.ForeignKey(Series,on_delete=models.CASCADE,verbose_name='Series',)
    puzzle_index = models.PositiveIntegerField(verbose_name='Order',default=0,editable=True,)

class Play(models.Model):
    puzzle = models.ForeignKey(Puzzle, on_delete=models.CASCADE, related_name='plays')
    user = models.ForeignKey(settings.AUTH_USER_MODEL, blank=True,null=True, on_delete=models.SET_NULL, related_name='plays')
    series = models.ForeignKey(Series, blank=True, null=True, on_delete=models.SET_NULL, related_name='plays')
    puzzle_completed = models.BooleanField(default=None, blank=False, null=False)
    ...

each user can play any puzzle several times, each time creating a Play record. that means that for a given set of (user,series,puzzle) we can have several Play records, some with puzzle_completed = True, some with puzzle_completed = False

What I am trying (unsuccesfully) to achieve, is to calculate for each series, through an annotation, the number of puzzles nb_completed_by_user and nb_not_completed_by_user.

For nb_completed_by_user, I have something which works in almost all cases (I have one glitch in one of my test that I cannot explain so far):

Series.objects.annotate(nb_completed_by_user=Count('puzzles',
filter=Q(puzzles__plays__puzzle_completed=True, 
    puzzles__plays__series_id=F('id'),puzzles__plays__user=user), distinct=True))

For nb_not_completed_by_user, I was able to make a query on Puzzle that gives me the good answer, but I am not able to transform it into a Subquery expression that works without throwing up an error, or to get a Count expression to give me the proper answer.

This one works:

puzzles = Puzzle.objects.filter(~Q(plays__puzzle_completed=True,
 plays__series_id=1, plays__user=user),series=s)

but when trying to move to a subquery, I cannot find the way to use the following expression not to throw the error:ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.

pzl_completed_by_user = Puzzle.objects.filter(plays__series_id=OuterRef('id')).exclude(
    plays__puzzle_completed=True,plays__series_id=OuterRef('id'), plays__user=user)

and the following Count expression doesn't give me the right result:

Series.objects.annotate(nb_not_completed_by_user=Count('puzzles', filter=~Q(
            puzzle__plays__puzzle_completed=True, puzzle__plays__series_id=F('id'), 
            puzzle__plays__user=user))

Could anybody explain me how I could obtain both values ? and eventually to propose me a link which explains clearly how to use subqueries for less-obvious cases than those in the official documentation

Thanks in advance

Edit March 2021: I recently found two posts which guided me through one potential solution to this specific issue: Django Count and Sum annotations interfere with each other and Django 1.11 Annotating a Subquery Aggregate

I implemented the proposed solution from https://stackoverflow.com/users/188/matthew-schinckel and https://stackoverflow.com/users/1164966/benoit-blanchon having help classes: class SubqueryCount(Subquery) and class SubquerySum(Subquery)

class SubqueryCount(Subquery):
    template = "(SELECT count(*) FROM (%(subquery)s) _count)"
    output_field = PositiveIntegerField()


class SubquerySum(Subquery):
    template = '(SELECT sum(_sum."%(column)s") FROM (%(subquery)s) _sum)'

    def __init__(self, queryset, column, output_field=None, **extra):
        if output_field is None:
            output_field = queryset.model._meta.get_field(column)
        super().__init__(queryset, output_field, column=column, **extra)

It works extremely well ! and is far quicker than the conventional Django Count annotation. ... at least in SQlite, and probably PostgreSQL as stated by others.

But when I tried in a MariaDB environnement ... it crashed ! MariaDB is apparently not able / not willing to handle correlated subqueries as those are considered sub-optimal.

In my case, as I try to get from the database multiple Count/distinct annotations for each record at the same time, I really see a tremendous gain in performance (in SQLite) that I would like to replicate in MariaDB.

Would anyone be able to help me figure out a way to implement those helper functions for MariaDB ?

What should template be in this environnement?

matthew-schinckel ? benoit-blanchon ? rktavi ?

@Bhargav Rao : What I provided was a solution that works in several environment. I found it a bit rough that you deleted it without a comment, or an explanation. As I am missing 9 credits to comment, It is a bit difficult for me to do otherwise that posting partial answers to get things going. Thanks for your understanding — Skratt, Mar 08 '21 at 11:23
You can edit out the follow up questions from the answer, make it a standalone solution and flag for it to be undeleted. — Bhargav Rao, Mar 08 '21 at 21:39
When you say "it crashed", do you mean it crashed MariaDB, or returned an error? If so, what was the error? It's likely to help understand the problem if we can see exactly what went wrong. — Matthew Schinckel, Mar 14 '21 at 22:53
@MatthewSchinckel : sorry abuse of language ; it returned OperationalError (1054, "Unknown column '.id' in 'where clause'") in python3.7/dist-packages/MySQLdb/connections.py in query (226. _mysql.connection.query(self, query) - Looking into it on MariaDB documentation, I found out that 'correlated subqueries' are not supported by MariaDB (mariadb.com/kb/en/subquery-limitations) — Skratt, Mar 16 '21 at 15:17

score 0 · Accepted Answer · answered Mar 13 '21 at 14:35

Going a bit deeper and analysis the Django docs a bit more in details, I was finally able to produce a satisfying way to produce a Count or Sum based on subquery.

For simplifying the process, I defined the following helper functions:

To generate the subquery:

def get_subquery(app_label, model_name, reference_to_model_object, filter_parameters={}):
    """
    Return a subquery from a given model (work with both FK & M2M)
    can add extra filter parameters as dictionary:

    Use:
        subquery = get_subquery(
                    app_label='puzzles', model_name='Puzzle',
                    reference_to_model_object='puzzle_family__target'
                    )
        or directly:
        qs.annotate(nb_puzzles=subquery_count(get_subquery(
            'puzzles', 'Puzzle','puzzle_family__target')),)
    """
    model = apps.get_model(app_label, model_name)

    # we need to declare a local dictionary to prevent the external dictionary to be changed by the update method:
    parameters = {f'{reference_to_model_object}__id': OuterRef('id')}
    parameters.update(filter_parameters)
    # putting '__id' instead of '_id' to work with both FK & M2M
    return model.objects.filter(**parameters).order_by().values(f'{reference_to_model_object}__id')

To count the subquery generated through get_subquery:

def subquery_count(subquery):
    """  
    Use:
        qs.annotate(nb_puzzles=subquery_count(get_subquery(
            'puzzles', 'Puzzle','puzzle_family__target')),)
    """
    return Coalesce(Subquery(subquery.annotate(count=Count('pk', distinct=True)).order_by().values('count'), output_field=PositiveIntegerField()), 0)

To sum the subquery generated through get_subquery on the field field_to_sum:

def subquery_sum(subquery, field_to_sum, output_field=None):
    """  
    Use:
        qs.annotate(total_points=subquery_sum(get_subquery(
            'puzzles', 'Puzzle','puzzle_family__target'),'points'),)
    """
    if output_field is None:
        output_field = queryset.model._meta.get_field(column)

    return Coalesce(Subquery(subquery.annotate(result=Sum(field_to_sum, output_field=output_field)).order_by().values('result'), output_field=output_field), 0)

The required imports:

from django.db.models import Count, Subquery, PositiveIntegerField, DecimalField, Sum
from django.db.models.functions import Coalesce

I spent so many hours on solving this ... I hope that this will save many of you all the frustration I went through figuring out the right way to proceed.

What is the SQL that this generates? How does this compare to the SQL that is generated by a more conventional database? Is there something we can add to Django's Subquery stuff to generate better correlated subqueries? — Matthew Schinckel, Mar 14 '21 at 22:51
(Also, the SQL that this generates could illustrate what the template could look like, simplifying your code to not require the wrapping). — Matthew Schinckel, Mar 14 '21 at 22:52
Looking at your solutions, you should just be able to use the Coalesce(...) stuff directly in your subquery annotation? — Matthew Schinckel, Mar 14 '21 at 23:10
@MatthewSchinckel: Thanks. I'll look into that further probably this weekend. — Skratt, Mar 16 '21 at 15:21

How to get a Count based on a subquery?

1 Answers1

Linked