I have a model for which I want to perform a group-by on two values and calculate the percentages of each value per outer grouping.
Currently I just make a query to get all the rows and put them into a pandas dataframe and perform something similar to the answer here. Although this works I'm sure it would be more efficient if I could make the query return the information I require directly.
I am currently running Django 2.0.5
with a backend DB on PostgreSQL 9.6.8
I think window functions could be the solution as indicated here but I cannot construct a successful combination of annotate
and values
to give me the desired output.
Another possible solution could be rollup
introduced in PostgreSQL 9.5
if I can find a way to get the summary row as a set of extra columns for each row? But I also think it's not yet supported by Django.
Model:
class ModelA(models.Model):
grouper1 = models.CharField()
grouper2 = models.CharField()
metric1 = models.IntegerField()
All rows:
grouper1 | grouper2 | metric1
---------+----------+---------
A | C | 2
A | C | 2
A | C | 2
A | D | 4
A | D | 4
A | D | 4
B | C | 5
B | C | 5
B | C | 5
B | D | 6
B | D | 4
B | D | 5
Desired output:
grouper1 | grouper2 | sum(metric1) | Percentage
---------+----------+--------------+-----------
A | C | 6 | 40
A | D | 12 | 60
B | C | 15 | 50
B | D | 15 | 50
I got close to what I expected with
ModelA.objects.all(
).values(
'grouper1',
'grouper2'
).annotate(
SumMetric1=Window(expression=Sum('metric1'), partition_by=[F('grouper1'), F('grouper2')]),
GroupSumMetric1=Window(expression=Sum('metric1'), partition_by=[F('grouper1')])
)
However this returns a row for every original row in the database like so:
grouper1 | grouper2 | sum(metric1) | Percentage
---------+----------+--------------+-----------
A | C | 6 | 40
A | C | 6 | 40
A | C | 6 | 40
A | D | 12 | 60
A | D | 12 | 60
A | D | 12 | 60
B | C | 15 | 50
B | C | 15 | 50
B | C | 15 | 50
B | C | 15 | 50
B | C | 15 | 50
B | D | 15 | 50