How to use PostgreSQL ArrayAgg function on a grouped by query set in Django ORM?

Question

To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity of A in B and Category stores the type of B.

A <--- Relation ---> B ---> Category

I am trying to eliminate joins in my query to reduce calculations time using PostgreSQL's ArrayAgg and indexing feature of the database based on this gist. (Since there are 18000 relations, 4000 Bs, and 1500 categories my calculations of each report would take almost two hours) and the error I get is: psycopg2.ProgrammingError: aggregate functions are not allowed in GROUP BY LINE 1: ... U0."id", U2."B" HAVING U0."id" = ANY((ARRAY_AGG(...

I have used Brad Martsberger solution to my previous question to calculate sum of each A's intensity occurred in B grouped by B's categories, Minimum and Maximum of calculated Intensity-sums in each category of B and the rate of occurrence of each A in each B category and the occurrence of B itself in that category :

annotation0 = {
        'SumIntensity': Sum('ARelation__Intensity'),
        'A_Ids': ArrayAgg('id')
    } 
annotation1 = {
        'BOccurrence' : Count('id', distinct=True),
    }
sub_filter0 = Q(id__any=OuterRef('A_Ids'))
sub_filter1 = Q(Category_id=OuterRef('ARelation__B__Category_id'))
subquery0 = A.objects.filter(sub_filter0).values('id','ARelation__B__Category_id').annotate(**annotation0).order_by('-SumIntensity').values('SumIntensity')[:1]
subquery1 = A.objects.filter(sub_filter0).values('id','ARelation__B__Category_id').annotate(**annotation0).order_by('SumIntensity').values('SumIntensity')[:1]
subquery2 = B.objects.filter(sub_filter1).values('Category_id').annotate(**annotation1).values('BOccurrence')[:1]
result = A.objects.values(
        'id','id','ARelation__B__Category_id'
    ).annotate(
        **annotation0
    )
result = result.annotate(MaxAIntensity=Subquery(subquery0))
result = result.annotate(MinAIntensity=Subquery(subquery1))
result = result.annotate(BOccurrence=Subquery(subquery2))
result = result.annotate(
        AOccurrence=Count('id', distinct=False)
)

How can I resolve the issue?

Is there any other efficient alternatives to this?

Well, I think you should try to use window expressions https://docs.djangoproject.com/en/2.0/ref/models/expressions/#window-functions — Basalex, Jan 24 '18 at 08:16
Thank you for the insight @Basalex. Is it possible to pass A_Ids (ArrayAgg('id')) to as an partition_by argument? I see that it's not filterable, does it mean that filtering the query have no effect on function? — Azee, Jan 24 '18 at 08:30
Well, I suggest use window expressions instead of grouping by. But it`s just a suggestion. I`m not 100% sure. I`ve not tried to use ArrayAgg with partition_by, so have no idea about it, but I don`t think it will work. — Basalex, Jan 24 '18 at 09:15
Thank you anyway @Basalex, I'm going to give it a try and post you up. — Azee, Jan 24 '18 at 09:23
Using Window, I get the error: django.core.exceptions.FieldError: Cannot compute Max('SumIntensity'): 'SumIntensity' is an aggregate. It seems it can't handle aggregation over annotated sum in partition which was the reason why we used Subquery at first place and I don't see a way to use aggregate and [:1] in window. Any thoughts? @Basalex — Azee, Jan 25 '18 at 07:43

How to use PostgreSQL ArrayAgg function on a grouped by query set in Django ORM?

0 Answers0