2

Using django with a MySQL DB and given these models:

ModelB   ---FK--->   ModelA
    - ref_type
    - ref_id
 
ModelC

I want to get all the ModelC for each ModelA via an annotation.

I tried many options looking at existing solutions but could not make it work. The following code works when there is just one ModelC for each ModelA but as soon as there is more than one, I get the Subquery returns more than 1 row error and I don't know how to get a list of the ModelC models instead. Ideally, I'd like to build a list of JSON objects of the ModelC.

qs = ModelA.objects.all()

c_ids = (
    ModelB.objects \
        .filter(modela_id=OuterRef(OuterRef('id')), ref_type='c') \
        .values('ref_id')
)
all_c = (
    ModelC.objects \
        .filter(id__in=Subquery(c_ids)) \
        .values('id')
)

qs1 = qs.annotate(all_c=Subquery(all_c ))
for p in qs1:
    print(p, p.all_c)

Michael
  • 8,357
  • 20
  • 58
  • 86

3 Answers3

1

The following should do

from django.db.models import JSONField
from django.db.models.aggregates import Aggregate

class JSONArrayAgg(Aggregate):
   function = "JSON_ARRAYAGG"
   output_field = JSONField()
   
ModelA.objects.annotate(
    all_c=Subquery(
        ModelB.objects.filter(
            ref_type="c",
            modela_id=OuterRef("id"),
        ).values(
            "modela_id"
        ).values_list(
            JSONArrayAgg("ref_id")
        )
    )
)

which translates to

SELECT
    model_a.*,
    (SELECT JSON_ARRAYAGG(model_b.ref_id)
     FROM model_b
     WHERE model_b.ref_type = "c" AND model_b.modela_id = model_a.id
     GROUP BY model_b.modela_id
    ) all_c
FROM model_a

But it would be much easier if you provided your exact model definition as it's likely only a matter of doing something along the lines of (JSONArrayAgg.filter cannot be used due to a MySQL bug.

ModelA.objects.filter(
   modelb_set__ref_type="c",
).annotate(
   all_c=JSONArrayAgg("modelb_set__ref_id")
)

which translate to

SELECT
    model_a.*,
    JSON_ARRAYAGG(model_b.ref_id)
FROM model_a
INNER JOIN model_b ON (model_b.modela_id = model_a.id)
WHERE model_b.ref_type = "c"
GROUP BY model_a.id

You could also use FilteredRelation if you want the condition to be pushed to the JOIN instead.

ModelA.objects.annotate(
   all_c_rel=FilteredRelation(
      "modelb_set", Q(modelb_set__ref_type="c")
   ),
   all_c=JSONArrayAgg("all_c_rel__ref_id")
)

Which results in

SELECT
    model_a.*,
    JSON_ARRAYAGG(model_b.ref_id)
FROM model_a
LEFT OUTER JOIN model_b ON (
    model_b.modela_id = model_a.id
    AND model_b.ref_type = "c"
)
GROUP BY model_a.id

But the LEFT OUTER JOIN might re-surface the issue you have with MySQL's handling of NULL in JSON_ARRAYAGG.

Simon Charette
  • 5,009
  • 1
  • 25
  • 33
  • Thanks Simon. I am trying the 2nd approach you added. However, I get this error `ValueError: invalid literal for int() with base 10: '[null]'` That might be because some objects have no related objects. Any idea how to work around this? – Michael Sep 16 '22 at 16:33
  • Ok the issue was multifold. The `JSONArrayAgg` was missing an `output_field = JSONField()` which I've just added and it also seems like [MySQL has a bug](https://bugs.mysql.com/bug.php?id=90835) in `JSON_ARRAYAGG` which doesn't exclude `NULL` values from the aggregate like it should. I'll adjust the queryset to account for that. – Simon Charette Sep 16 '22 at 16:53
  • @Michael any update on the above. I've tested all the solutions on MySQL and they seem to work? – Simon Charette Sep 19 '22 at 15:17
  • I was still hitting some issues but your answer is the closest and the most detailed so I will accept it anyway. Thank you for your help. – Michael Sep 19 '22 at 19:16
0

ModelB looks like a junction table. Having an id pointing to A and C

Django supports junction tables.

But when it comes to annotation the objects with the list of ids, I'm not entire sure if that is possible purely by the ORM.

class ModelA(models.Model):
    model_c_objects = models.ManyToManyField("ModelC", through="ModelB") 

class ModelB(models.Model):
    model_a = models.ForeignKey(ModelA, on_delete=models.CASCADE)
    model_b = models.ForeignKey(ModelB, on_delete=models.CASCADE)

class ModelC(models.Model):
    ...


# This one here I have no idea if it would work or not
ModalA.objects.prefetch_related("models_c_objects").annotate(model_c_object_ids=ArrayAgg("model_c_objects__id")

# If it doesn't:
class ModelA(models.Model):
    model_c_objects = models.ManyToManyField("ModelC", through="ModelB") 
    
    @property
    def model_c_object_ids(self):
        return list(self.model_c_objects.values("id", flat=True))

# And you can then use it like you wished
for model_a_object in ModelA.objects.prefetch_related("models_c_objects"):
    model_a_object.model_c_object_ids # list of model_c ids like: [1,4,12,63]

I'm feeling a bit lazy but either of the two solutions should work and they both use a single query.

Işık Kaplan
  • 2,815
  • 2
  • 13
  • 28
  • ArrayAgg is not working in MySQL and the second option creates a query for every Model A. – preator Sep 12 '22 at 16:36
  • Yes I found out that it would be easier on PostgreSQL but I don't have the choice of the DB at the moment. – Michael Sep 12 '22 at 19:36
  • @preator those values are already prefetched, how do they create a new query per row? – Işık Kaplan Sep 12 '22 at 20:11
  • @Michael ah, my bad, I didn't see MySQL part. Care to give the second one a shot? – Işık Kaplan Sep 12 '22 at 20:11
  • `prefetch_related("models_c_objects")` does not work because there is no true relationship defined in the ORM/database. There is not a true FK from ModelB to ModelC. It's a legacy DB and ModelB is setup so that the c_id sometimes refers to ModelC objects and sometimes other things so there is another field that identifies the relationship. I did not think it was needed for the question but now realizing it is so I will update the question. Thank you. – Michael Sep 12 '22 at 20:55
  • prefetching works for access on objects so if you would do some nested forloop to extract the ids it would work without making additional query. `.values()` executes the queryset again every time it is called. Also prefetch only makes sense when you use the objects not values as in `values` scenario the ORM knows precisely what data will be returned, thus what joins are required for the query. – preator Sep 16 '22 at 08:24
0

I come from the assumption that Model B is indeed a through table for M2M relationship between Model A and Model C as Işık Kaplan suggested.

In Postgres you could use ArrayAgg like Işık Kaplan suggested. Equivalent in MySQL in GROUP_CONCAT but it is not present in the ORM out of the box. Also from personal experience I wouldn't recommend it as it performed terribly in my use case.

What I ended up doing was combining 2 queries using Python which was way faster then 1 complicated query with GROUP_CONCAT (around 60K records of "Model A" and 20K of "Model B" in my case). In your case it would look like this:

a_qs = ModelA.objects.all()
c_ids_dict = defaultdict(list)
c_ids = a_qs.values("id", "models_c_objects__id")
for item in c_ids:
    if item["models_c_objects__id"]:
        c_ids_dict[item["id"]].append(item["models_c_objects__id"])
for p in a_qs:
    print(p, c_ids_dict.get(p.id, []))
preator
  • 984
  • 5
  • 6
  • Right, I found out that it would be easier on PostgreSQL but I don't have the choice of the DB at the moment. Also, there is not a true FK from ModelB to ModelC. It's a legacy DB and ModelB is setup so that the `c_id` sometimes refers to ModelC objects and sometimes other things so there is another field that identifies the relationship. I did not think it was needed for the question but now realizing it is so I will update the question. – Michael Sep 12 '22 at 19:41