49

Prologue:

This is a question arising often in SO:

I have composed an example on SO Documentation but since the Documentation will get shut down on August 8, 2017, I will follow the suggestion of this widely upvoted and discussed meta answer and transform my example to a self-answered post.

Of course, I would be more than happy to see any different approach as well!!


Question:

Assume the model:

class Books(models.Model):
    title  = models.CharField()
    author = models.CharField()
    price = models.FloatField()

How can I perform the following queries on that model utilizing Django ORM:

  • GROUP BY ... COUNT:

    SELECT author, COUNT(author) AS count
    FROM myapp_books GROUP BY author
    
  • GROUP BY ... SUM:

    SELECT author,  SUM (price) AS total_price
    FROM myapp_books GROUP BY author
    
John Moutafis
  • 22,254
  • 11
  • 68
  • 112

2 Answers2

104

We can perform a GROUP BY ... COUNT or a GROUP BY ... SUM SQL equivalent queries on Django ORM, with the use of annotate(), values(), the django.db.models's Count and Sum methods respectfully and optionally the order_by() method:

  • GROUP BY ... COUNT:

     from django.db.models import Count
    
     result = Books.objects.values('author')
                           .order_by('author')
                           .annotate(count=Count('author'))
    

    Now result contains a dictionary with two keys: author and count:

       author    | count
     ------------|-------
      OneAuthor  |   5
     OtherAuthor |   2
        ...      |  ...
    
  • GROUP BY ... SUM:

     from django.db.models import Sum
    
      result = Books.objects.values('author')
                            .order_by('author')
                            .annotate(total_price=Sum('price'))
    

    Now result contains a dictionary with two columns: author and total_price:

       author    | total_price
     ------------|-------------
      OneAuthor  |    100.35
     OtherAuthor |     50.00
         ...     |      ...
    

UPDATE 13/04/2021

As @dgw points out in the comments, in the case that the model uses a meta option to order rows (ex. ordering), the order_by() clause is paramount for the success of the aggregation!

John Moutafis
  • 22,254
  • 11
  • 68
  • 112
  • You should also add joined tables with group by and "having" filters. For me it's counterintuitive because in SQL, you usually start with the parent and in django you start with the child. – Henrietta Martingale Sep 16 '19 at 13:17
  • @HenriettaMartingale If I understand correctly what you mean, you can use `filter` before extracting the `values`. – John Moutafis Sep 16 '19 at 14:26
  • You mean filter again after annotate, and the orm is smart enough to know it needs to do having? – Henrietta Martingale Oct 17 '19 at 15:06
  • 2
    Here's what worked for me: statement_line.objects.filter(pay_date__lt='2019-10-31').select_related('ae').values('ae__opp_own').annotate(tots=Sum('amt')).filter(tots__gt=0) The critical key was the select related and the double underline for parent field name. The second filter does turn to "having". str([obj].query) confirms this. Another handy thing. – Henrietta Martingale Oct 17 '19 at 15:51
  • @HenriettaMartingale This Q&A style is more general and what you have composed here doesn't really relate to the question or the answer. Are you sure this was going as a comment to this post? – John Moutafis Oct 17 '19 at 16:45
  • Your fourth bullet point above included "join" queries, but your examples only had one model and thus table. Joins usually uses two or more so I added the select related and double underline items in there. – Henrietta Martingale Oct 21 '19 at 16:13
  • 2
    Maybe one should emphasize the `order_by(...)` part. If the model uses different columns for ordering, omitting the `order_by()` clause will cause the aggregation to fail. – dgw Apr 13 '21 at 14:00
  • @dgw Good catch, I have updated the post accordingly! – John Moutafis Apr 13 '21 at 14:50
0

in group by SUM() you can get almost two dict objects like

inv_data_tot_paid =Invoice.objects.aggregate(total=Sum('amount', filter=Q(status = True,month = m,created_at__year=y)),paid=Sum('amount', filter=Q(status = True,month = m,created_at__year=y,paid=1)))
print(inv_data_tot_paid)
##output -{'total': 103456, 'paid': None}

do not try out more than two query filter otherwise, you will get error like

Saeed
  • 3,294
  • 5
  • 35
  • 52