4

I have the following Django (3.0) ORM model:

class Portfolio(models.Model):
    code = models.CharField(max_length=20)
    name = models.CharField(max_length=100)
    date = models.DateField()
    created = models.DateTimeField(blank=True, null=True, auto_now_add=True)

I want to group them by the code field and for each code only take the portfolio that has the max date

I know I can use the following query:

Portfolio.objects.values('code').annotate(latest=Max('date'))

But it has three problems:

  1. It gives me only the code and latest fields, so I lose the other fields
  2. It gives back a dictionary, while I want a list of actual Portfolio objects
  3. Max works because date is a DateField. It would also work on other numeric field types, but what if I want to order the records by the value of a CharField (lexicographic order), for instance name and take the first record for each group?

So, to sum up, my question: How do I use Django ORM to retrieve a list of ORM Objects grouped by one or more fields and have back only the first record of each group given an arbitrary "order by" clause?

funnydman
  • 9,083
  • 4
  • 40
  • 55
pistacchio
  • 56,889
  • 107
  • 278
  • 420

1 Answers1

8

Getting first / last value from group(s) is exactly what DISTINCT ON with ORDER BY SQL clause can be used for (but afaik only in Postgresql. In MySQL just DISTINCT, no ON, so - not possible(directly), SQLite also does not support DISTINCT ON, just DISTINCT. To signify this, in Django positional arguments to .distinct() can be passed only in Postgresql).

In Django we can do this with QuerySet like this:

Portfolio.objects.order_by().order_by(
    'code', # first, cause we want to group by this value
    '-created' # descending order, latest / max will be first
).distinct('code')

Here we use empty .order_by() call to __clear all ordering already present __ on QuerySet (added or default) to make sure only required ordering is applied with next .order_by(...) for grouping to work correctly.


General way to use it:

  • Starting QuerySet with filters applied - q = SomeModel.objects.filter(col1__gt=2)

  • clear ordering that is already set on the QuerySet - q.order_by()

  • add ordering with the grouping columns first (used in distinct later) with additional columns after - q.order_by('col1', '-col2', '-col3', '-date1')

    Here:

    • 'col1', 'col2', 'col3' - are columns / fields we want to GROUP BY (for grouping)

    • 'col1', '-col2', '-col3' - same columns we want to group by, but with the ordering we want to use for grouping (crucial for all sub-groups - all columns in groups list except the first one - for them it will affect the resulted rows being fetched from groups - 'first' or 'last'; for the first column it will not affect result rows, only result ordering)

    • '-date1' - any additional ordering columns of our choice to order final group rows
  • Finally add .distinct() clause with the fields we choose for grouping as arguments, in the same order as in .order_by() clause - q.distinct('col1', 'col2', 'col3')

Oleg Russkin
  • 4,234
  • 1
  • 8
  • 20