497

I query a model:

Members.objects.all()

And it returns:

Eric, Salesman, X-Shop
Freddie, Manager, X2-Shop
Teddy, Salesman, X2-Shop
Sean, Manager, X2-Shop

What I want is to know the best Django way to fire a group_by query to my database, like:

Members.objects.all().group_by('designation')

Which doesn't work, of course. I know we can do some tricks on django/db/models/query.py, but I am just curious to know how to do it without patching.

ivanleoncz
  • 9,070
  • 7
  • 57
  • 49
simplyharsh
  • 35,488
  • 12
  • 65
  • 73

13 Answers13

748

If you mean to do aggregation you can use the aggregation features of the ORM:

from django.db.models import Count
result = (Members.objects
    .values('designation')
    .annotate(dcount=Count('designation'))
    .order_by()
)

This results in a query similar to

SELECT designation, COUNT(designation) AS dcount
FROM members GROUP BY designation

and the output would be of the form

[{'designation': 'Salesman', 'dcount': 2}, 
 {'designation': 'Manager', 'dcount': 2}]

If you don't include the order_by(), you may get incorrect results if the default sorting is not what you expect.

If you want to include multiple fields in the results, just add them as arguments to values, for example:

    .values('designation', 'first_name', 'last_name')

References:

Flimm
  • 136,138
  • 45
  • 251
  • 267
Guðmundur H
  • 11,478
  • 3
  • 24
  • 22
  • 1
    You wont believe i was juggling with same piece of code currently. Yeah 1.1 has quite a good things to watch. – simplyharsh Mar 10 '09 at 11:18
  • 3
    How would you add another filter to lets say look for distinct values by a date? – Harry Aug 31 '11 at 08:38
  • 8
    @Harry: You can chain it. Something like: `Members.objects.filter(date=some_date).values('designation').annotate(dcount=Count('designation'))` – Eli May 15 '13 at 23:14
  • 93
    i have a question, this query is only returning designation and dcount, what if i want to get other values of the table too? – A.J. Mar 05 '14 at 08:02
  • 24
    Note that if your sorting is a field other than designation, it will not work without resetting the sort. See http://stackoverflow.com/a/1341667/202137 – Gidgidonihah May 05 '14 at 19:41
  • 14
    @Gidgidonihah True, the example should read `Members.objects.order_by('disignation').values('designation').annotate(dcount=Count('designation'))` – bjunix Oct 30 '14 at 15:16
  • 5
    I is possible to perform the query without the `count` ? – aDoN Dec 22 '15 at 10:36
  • 23
    i have a question, this query is only returning designation and dcount, what if i want to get other values of the table too? – Yann叶 Oct 26 '16 at 02:18
  • 4
    @Clayton: Put the other fields inside values parenthesis like this: Members.objects.values('designation', 'field2', 'field3').annotate(dcount=Count('designation')) – MouTio Nov 07 '16 at 12:13
  • @Yann叶: Put the other fields inside values parenthesis like this: Members.objects.values('designation', 'field2', 'field3').annotate(dcount=Count('designation')) – MouTio Nov 07 '16 at 12:13
  • sometimes I annotating this way, its equivalent to *SELECT designation, COUNT(designation) AS dcount FROM members GROUP BY members.id* . Why this happening ? – JPG Nov 06 '17 at 16:25
  • 2
    This MUST(though optional) ends with `.order_by(designation)`, the reason being is depending on the model or database their would be an implicit `order_by` in place totally breaking the grouping mechanism. So being explicit saves here. – nehem Jan 17 '18 at 00:16
  • 1
    @MouTio if I put another fields into values, then it will break the group by rule, ex: values('designation', 'field2'), then it will group by field2 instead of designation. Is there any other way to achieve query like this? Select Max(field1), field2, field3 from table group by field2 – paugoo Sep 21 '18 at 11:12
  • 2
    To explain @nehemiah's comment a bit (took me 15 minutes to realize I was hitting this case in a codebase I'm unfamiliar with), an `ordering` meta attribute on the model automatically adds those fields to the `GROUP BY` in addition to the one in `.values()`. Inserting an `.order_by()` into the query prevents those extra fields from being added. – Izkata Jun 14 '19 at 15:33
  • How to select the other fields like first_name,last_name without putting this into group by – Braham Dev Yadav Aug 07 '19 at 06:42
  • @A.J. I needed a normal `group_by ` i ended up doing a normal query for all then add the field i need to a set to remove duplicates. `questionnaires = set() questions = Question.objects.all() for question in questions: questionnaires.add(question.set_id)` – user3821178 Sep 12 '19 at 06:33
  • 1
    So the answer works fine but what i think is we cannot just add fields to the values method to return multiple fields. When we add multiple fields to it, then if the fields have different values then they won't be grouped by annotate (that is what happened with me). It only works when you want to group as per some column or columns with same value. So how i understood it is that values produces a dict and then annotate compares those dicts and groups them accordingly...please correct me if i am wrong – Irfan wani Mar 26 '22 at 03:43
  • How to access to name and shop name in result? – reza_khalafi Oct 29 '22 at 13:30
78

An easy solution, but not the proper way is to use raw SQL:

results = Members.objects.raw('SELECT * FROM myapp_members GROUP BY designation')

Another solution is to use the group_by property:

query = Members.objects.all().query
query.group_by = ['designation']
results = QuerySet(query=query, model=Members)

You can now iterate over the results variable to retrieve your results. Note that group_by is not documented and may be changed in future version of Django.

And... why do you want to use group_by? If you don't use aggregation, you can use order_by to achieve an alike result.

daaawx
  • 3,273
  • 2
  • 17
  • 16
Michael
  • 780
  • 6
  • 14
  • 1
    Can you please tell me how to do it using order_by?? – simplyharsh Mar 10 '09 at 11:19
  • 2
    Hi, if you are not using aggregation you could emulate group_by by using an order_by and eliminate the entries you don't need. Of course, this is an emulation and is only useable when using not a lot of data. Since he didn't speak of aggregation, I thought it could be a solution. – Michael Mar 11 '09 at 10:28
  • Hey this is great - can you please explain how to the use execute_sql it doesn't appear to work.. – rh0dium Jul 12 '12 at 23:47
  • 19
    Note this no longer works on Django 1.9. http://stackoverflow.com/questions/35558120/query-group-by-in-django-1-9 – grokpot Mar 08 '17 at 18:46
  • 2
    This is kind of a hack-ish way to use the ORM. You shouldn't have to instantiate new querysets passing in old ones manually. – Ian Kirkpatrick Apr 19 '18 at 13:34
59

You can also use the regroup template tag to group by attributes. From the docs:

cities = [
    {'name': 'Mumbai', 'population': '19,000,000', 'country': 'India'},
    {'name': 'Calcutta', 'population': '15,000,000', 'country': 'India'},
    {'name': 'New York', 'population': '20,000,000', 'country': 'USA'},
    {'name': 'Chicago', 'population': '7,000,000', 'country': 'USA'},
    {'name': 'Tokyo', 'population': '33,000,000', 'country': 'Japan'},
]

...

{% regroup cities by country as countries_list %}

<ul>
    {% for country in countries_list %}
        <li>{{ country.grouper }}
            <ul>
            {% for city in country.list %}
                <li>{{ city.name }}: {{ city.population }}</li>
            {% endfor %}
            </ul>
        </li>
    {% endfor %}
</ul>

Looks like this:

  • India
    • Mumbai: 19,000,000
    • Calcutta: 15,000,000
  • USA
    • New York: 20,000,000
    • Chicago: 7,000,000
  • Japan
    • Tokyo: 33,000,000

It also works on QuerySets I believe.

source: https://docs.djangoproject.com/en/2.1/ref/templates/builtins/#regroup

edit: note the regroup tag does not work as you would expect it to if your list of dictionaries is not key-sorted. It works iteratively. So sort your list (or query set) by the key of the grouper before passing it to the regroup tag.

inostia
  • 7,777
  • 3
  • 30
  • 33
  • 2
    This is perfect! I've searched a lot for a simple way to do this. And it works on querysets as well, that's how I used it. – CarmenA Feb 27 '18 at 11:43
  • 4
    this is totally wrong if you read from database big set of data and then just use aggregated values. – Sławomir Lenart Mar 02 '18 at 12:34
  • 1
    @SławomirLenart sure, this might not be as efficient as a straight DB query. But for simple use cases it can be a nice solution – inostia Mar 02 '18 at 18:55
  • This will work if the result shown in template. But, for JsonResponse or other indirect response. this solution will not work. – Willy satrio nugroho Jul 20 '18 at 08:53
  • 1
    @Willysatrionugroho if you wanted to do it in a view, for example, https://stackoverflow.com/questions/477820/whats-the-idiomatic-python-equivalent-to-djangos-regroup-template-tag might work for you – inostia Jul 20 '18 at 17:32
10

Django does not support free group by queries. I learned it in the very bad way. ORM is not designed to support stuff like what you want to do, without using custom SQL. You are limited to:

  • RAW sql (i.e. MyModel.objects.raw())
  • cr.execute sentences (and a hand-made parsing of the result).
  • .annotate() (the group by sentences are performed in the child model for .annotate(), in examples like aggregating lines_count=Count('lines'))).

Over a queryset qs you can call qs.query.group_by = ['field1', 'field2', ...] but it is risky if you don't know what query are you editing and have no guarantee that it will work and not break internals of the QuerySet object. Besides, it is an internal (undocumented) API you should not access directly without risking the code not being anymore compatible with future Django versions.

Luis Masuelli
  • 12,079
  • 10
  • 49
  • 87
9

You could also use pythons built-in itertools.groupby directly:

from itertools import groupby

designation_key_func = lambda member: member.designation
queryset = Members.objects.all().select_related("designation")

for designation, member_group in groupby(queryset, designation_key_func):
    print(f"{designation} : {list(member_group)}")

No raw sql, subqueries, third-party-libs or templatetags needed and pythonic and explicit in my eyes.

ralfzen
  • 327
  • 3
  • 8
8

The following module allows you to group Django models and still work with a QuerySet in the result: https://github.com/kako-nawao/django-group-by

For example:

from django_group_by import GroupByMixin

class BookQuerySet(QuerySet, GroupByMixin):
    pass

class Book(Model):
    title = TextField(...)
    author = ForeignKey(User, ...)
    shop = ForeignKey(Shop, ...)
    price = DecimalField(...)

class GroupedBookListView(PaginationMixin, ListView):
    template_name = 'book/books.html'
    model = Book
    paginate_by = 100

    def get_queryset(self):
        return Book.objects.group_by('title', 'author').annotate(
            shop_count=Count('shop'), price_avg=Avg('price')).order_by(
            'name', 'author').distinct()

    def get_context_data(self, **kwargs):
        return super().get_context_data(total_count=self.get_queryset().count(), **kwargs)

'book/books.html'

<ul>
{% for book in object_list %}
    <li>
        <h2>{{ book.title }}</td>
        <p>{{ book.author.last_name }}, {{ book.author.first_name }}</p>
        <p>{{ book.shop_count }}</p>
        <p>{{ book.price_avg }}</p>
    </li>
{% endfor %}
</ul>

The difference to the annotate/aggregate basic Django queries is the use of the attributes of a related field, e.g. book.author.last_name.

If you need the PKs of the instances that have been grouped together, add the following annotation:

.annotate(pks=ArrayAgg('id'))

NOTE: ArrayAgg is a Postgres specific function, available from Django 1.9 onwards: https://docs.djangoproject.com/en/3.2/ref/contrib/postgres/aggregates/#arrayagg

Risadinha
  • 16,058
  • 2
  • 88
  • 91
  • This [django-group-by](https://github.com/kako-nawao/django-group-by) is an alternative to the `values` method. It's for different purpose I think. – LShi Jul 06 '17 at 09:21
  • 1
    @LShi It's not an alternative to values, of course not. `values` is an SQL `select` while `group_by` is an SQL `group by` (as the name indicates...). Why the downvote? We are using such code in production to implement complex `group_by` statements. – Risadinha Jul 07 '17 at 11:56
  • Its [doc](https://github.com/kako-nawao/django-group-by) says `group_by` "behaves mostly like the values method, but with one difference..." The doc doesn't mention SQL `GROUP BY` and the use case it provides doesn't suggest it has anything to do with SQL `GROUP BY`. I will draw back the down-vote when someone has made this clear, but that doc is really misleading. – LShi Jul 07 '17 at 14:00
  • After reading [the doc for `values`](https://docs.djangoproject.com/en/1.11/topics/db/aggregation/#values), I found I missed that `values` itself works like a GROUP BY. It's my fault. I think it's simpler to use `itertools.groupby` than this django-group-by when `values` is insufficient. – LShi Jul 07 '17 at 15:37
  • 1
    It is impossible to do the `group by` from above with a simple `values` call -with or without `annotate` and without fetching everything from the database. Your suggestion of `itertools.groupby` works for small datasets but not for several thousands of datasets that you probably want to page. Of course, at that point you'll have to think about a special search index that contains prepared (already grouped) data, anyway. – Risadinha Jul 08 '17 at 17:24
  • [The doc for `values`](https://docs.djangoproject.com/en/1.11/topics/db/aggregation/#values) is a little different from what I experienced in Django 1.8. (Perhaps the version matters here, or I haven't understood.) [This question](https://stackoverflow.com/questions/37851053/django-query-group-by-month) demostrates a typical usage, which requires an `order_by` (without which the result would need futher processing). – LShi Jul 09 '17 at 01:34
  • Sorry, my previous comment is misleading where I write "from above". With that I am refering to the example in my own answer which I offered to help others with more complex use cases. – Risadinha Jul 10 '17 at 09:21
7

The documentation says that you can use values to group the queryset .

class Travel(models.Model):
    interest = models.ForeignKey(Interest)
    user = models.ForeignKey(User)
    time = models.DateTimeField(auto_now_add=True)

# Find the travel and group by the interest:

>>> Travel.objects.values('interest').annotate(Count('user'))
<QuerySet [{'interest': 5, 'user__count': 2}, {'interest': 6, 'user__count': 1}]>
# the interest(id=5) had been visited for 2 times, 
# and the interest(id=6) had only been visited for 1 time.

>>> Travel.objects.values('interest').annotate(Count('user', distinct=True)) 
<QuerySet [{'interest': 5, 'user__count': 1}, {'interest': 6, 'user__count': 1}]>
# the interest(id=5) had been visited by only one person (but this person had 
#  visited the interest for 2 times

You can find all the books and group them by name using this code:

Book.objects.values('name').annotate(Count('id')).order_by() # ensure you add the order_by()

You can watch some cheat sheet here.

djvg
  • 11,722
  • 5
  • 72
  • 103
ramwin
  • 5,803
  • 3
  • 27
  • 29
4

You need to do custom SQL as exemplified in this snippet:

Custom SQL via subquery

Or in a custom manager as shown in the online Django docs:

Adding extra Manager methods

Van Gale
  • 43,536
  • 9
  • 71
  • 81
  • 1
    Kind of round-trip solution. I would have used it, if i had some extended use of that. But here i just need the number of members per designation thats all. – simplyharsh Mar 10 '09 at 11:22
  • No problem. I thought about mentioning 1.1 aggregation features but made the assumption you were using the release version :) – Van Gale Mar 10 '09 at 11:26
  • It's all about using raw queries, which show the weakness of Django's ORM. – Sławomir Lenart Mar 02 '18 at 12:46
1

If, in other words, you need to just "remove duplicates" based on some field, and otherwise just to query the ORM objects as they are, I came up with the following workaround:

from django.db.models import OuterRef, Exists

qs = Members.objects.all()
qs = qs.annotate(is_duplicate=Exists(
    Members.objects.filter(
        id__lt=OuterRef('id'),
        designation=OuterRef('designation')))
qs = qs.filter(is_duplicate=False)

So, basically we're just annotating the is_duplicate value by using some convenient filtering (which might vary based on your model and requirements), and then simply using that field to filter out the duplicates.

Raekkeri
  • 761
  • 6
  • 6
1

If you want the model objects, and not just plain values or dictionaries, you can do something like this:

members = Member.objects.filter(foobar=True)
designations = Designation.objects.filter(member__in=members).order_by('pk').distinct()

Replace member__in with the lowercase version of your model name, followed by __in. For example, if your model name is Car, use car__in.

Flimm
  • 136,138
  • 45
  • 251
  • 267
1

This is little complex, but get questioner what he/she expected with only one DB hit.

from django.db.models import Subquery, OuterRef

member_qs = Members.objects.filter(
    pk__in = Members.objects.values('designation').distinct().annotate(
        pk = Subquery(
          Members.objects.filter(
            designation= OuterRef("designation")
        )
        .order_by("pk") # you can set other column, e.g. -pk, create_date...
        .values("pk")[:1]
        ) 
    )
   .values_list("pk", flat=True)
)
rumbarum
  • 803
  • 7
  • 7
1

For some reason, the above mentioned solutions did not work for me. This is what worked:

dupes_query = MyModel.objects.all().values('my_field').annotate(
    count=Count('id')
).order_by('-count').filter(count__gt=1)

I hope it helps.

Özer
  • 2,059
  • 18
  • 22
-6
from django.db.models import Sum
Members.objects.annotate(total=Sum(designation))

first you need to import Sum then ..