1

I want to generate the following query:

select id, (select count(*) from B where B.x = A.x) as c from A

Which should be simple enough with the Subquery expression. Except I get a group by statement added to my count query which I can't get rid of:

from django.contrib.contenttypes.models import ContentType

str(ContentType.objects.annotate(c=F('id')).values('c').query)
# completely fine query with annotated field
'SELECT "django_content_type"."id" AS "c" FROM "django_content_type"'

str(ContentType.objects.annotate(c=Count('*')).values('c').query)
# gets group by for every single field out of nowhere
'SELECT COUNT(*) AS "c" FROM "django_content_type" GROUP BY "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model"'

Which makes the result be [{'c': 1}, {'c': 1}, {'c': 1}, {'c': 1},...] instead of [{c:20}]. But subqueries have to have only one row of result to be usable.

Since the query is supposed to be used in a subquery I can't use .count() or .aggregate() either since those evaluate instantly and complain about the usage of OuterRef expression.

Example with subquery:

str(ContentType.objects.annotate(fields=Subquery(
    Field.objects.filter(model_id=OuterRef('pk')).annotate(c=Count('*')).values('c')
)).query)

Generates

SELECT "django_content_type"."id",
       "django_content_type"."app_label",
       "django_content_type"."model",
       (SELECT COUNT(*) AS "c"
        FROM "meta_field" U0
        WHERE U0."model_id" = ("django_content_type"."id")
        GROUP BY U0."id", U0."model_id", U0."module", U0."name", U0."label", U0."widget", U0."visible", U0."readonly",
                 U0."desc", U0."type", U0."type_model_id", U0."type_meta_id", U0."is_type_meta", U0."multi",
                 U0."translatable", U0."conditions") AS "fields"
FROM "django_content_type"

Expected query:

SELECT "django_content_type"."id",
       "django_content_type"."app_label",
       "django_content_type"."model",
       (SELECT COUNT(*) AS "c"
        FROM "meta_field" U0
        WHERE U0."model_id" = ("django_content_type"."id")) AS "fields"
FROM "django_content_type"

Update: (to add models from real app requested in comments):

class Translation(models.Model):
    field = models.ForeignKey(MetaField, models.CASCADE)
    ref_id = models.IntegerField()
    # ... other fields

class Choice(models.Model):
    meta = models.ForeignKey(MetaField, on_delete=models.PROTECT)
    # ... other fields

I need a query to get number of Translations available for each choice where Translation.field_id refers to Choice.meta_id and Translation.ref_id refers to Choice.id.

The reason there are no foreign keys is that not all meta fields are choice fields (e.g. text fields may also have translations). I could make a separate table for each translatable entity, but this setup should be easy to use with a count subquery that doesn't have a group by statement in it.

Bor691
  • 606
  • 1
  • 11
  • 21
  • I don't understand why you can't do `ContentType.objects.annotate(c=Count('field')).values('id', 'c')` assuming `Field` has a fk to `ContentType`. – dirkgroten Jan 07 '20 at 16:57
  • Subqueries need to only have one field selected. otherwise, the database won't know which field to extract for the outer query. – Bor691 Jan 07 '20 at 16:58
  • @dirkgroten Oh, you meant like not use a subquery at all, that won't work because the actual models don't have a simple relation. it's based on multiple fields. This was a simplified example to get the point about getting rid of `group by` across. – Bor691 Jan 07 '20 at 17:01
  • Then it would have helped to show the actual models. – dirkgroten Jan 07 '20 at 17:05
  • @dirkgroten I thought the actual/expected query would be enough, given that I only need to remove part of the generated query while the rest of it is correct. But I added parts of the actual models. Thanks for your time. – Bor691 Jan 07 '20 at 17:15
  • `annotate` is the one creating the group query, because `annotate` always just considers one row at a time. `Field.objects.filter(model_id=pk).annotate(c=Count('*'))` never counts across multiple rows. It just does c=1 for every row – dirkgroten Jan 07 '20 at 17:18
  • @dirkgroten Is there an alternative to annotate that would add aliases/custom selects without adding group by, or an alternative to aggregate that doesn't evaluate instantly so it could be used in a subquery expression? – Bor691 Jan 07 '20 at 17:21
  • 1
    `Choice.objects.annotate(c=Count('meta__translation', filter=Q(meta__translation__ref_id=F('id'))))` – dirkgroten Jan 07 '20 at 17:22
  • maybe with the addition of `distinct=True` in the `Count()` – dirkgroten Jan 07 '20 at 17:26
  • @dirkgroten Hmm, that would work in this specific case (although with 3 joins and a complicated group by clause (including dynamic columns) instead of a simple subquery) since they both have a relation to meta table, but I'm keeping the question open for the more generic answer. since aggregated subqueries are not that uncommon. (also I think a subquery would be faster?) – Bor691 Jan 07 '20 at 17:26
  • 1
    Another approach does not exist in the Django ORM, AFAIK. Use raw queries. – dirkgroten Jan 07 '20 at 17:27

2 Answers2

2

UPDATE Here's a query using subquery that should come close to what you want:

str(ContentType.objects.annotate(fields=Subquery(
    Field.objects.filter(model_id=OuterRef('pk')).values('model').annotate(c=Count('pk')).values('c')
)).query)

The only thing I did was adding the values('model') group_by clause which makes the Count('pk') actually work since it aggregates all rows into one.

It will return null instead of 0 when there are no related rows, which you can probably transform to 0 using a Coalesce function or a Case ... When ... then.

The exact query you want isn't possible with the Django ORM, although you can achieve the same result with

Choice.objects.annotate(c=Count(
    'meta__translation',
    distinct=True,
    filter=Q(meta__translation__ref_id=F('id'))
))

Alternatively look at the django-sql-utils package, as also mentioned in this post.

dirkgroten
  • 20,112
  • 2
  • 29
  • 42
  • Could you also provide an example of how to use the package you suggested? I installed it but it doesn't take a subquery expression just adding `count` to it. It takes a string, which is supposedly a relation/field on the model. How can I get it to count instances of another table with no relation to the current one? – Bor691 Jan 07 '20 at 17:45
  • As for your update, I figured that a `values()` before annotate can affect the `group by` clause, but that still doesn't give the same result (although very close). The main difference is null vs 0 in for fields with no matching rows. But that can be solved with a coalesce. I didn't think to use columns in the where clause for group by. – Bor691 Jan 07 '20 at 17:48
  • 1
    With sql-utils: `Choice.objects.annotate(c=SubqueryCount(‘meta__translation', filter=Q(meta__translation__ref_id=F('id'))))` – dirkgroten Jan 07 '20 at 21:21
0

It is a bit of a dirty hack, but after diving inside Django's ORM code, I found the following works wonderfully for me (I am trying to use your own example's subquery):

counting_subquery = Subquery( Field.objects
                                   .filter( model_id = OuterRef( 'pk' ) )
                                   .annotate( c = Count( '*' ) )
                                   .values('c') )

# Note: the next line fixes a bug in the Django ORM, where the subquery defined above
# triggers an unwanted group_by clause in the generated SQL which ruins the count operation.
counting_subquery.query.group_by = True

results = ContentType.objects
                     .annotate( fields_count = Subquery( counting_subquery ) )
                     ...

The key is setting group_by to True. That gets rid of the unwanted group_by clause in your SQL.

I am not happy about it, as it relies on Django's undocumented behaviour to work. But I can live with it; I am even less happy about the maintainability of using direct SQL in the subquery...

tdy
  • 36,675
  • 19
  • 86
  • 83