4

Imagine we have the Django ORM model Meetup with the following definition:

class Meetup(models.Model):
    language = models.CharField()
    speaker = models.CharField()
    date = models.DateField(auto_now=True)

I'd like to use a single query to fetch the language, speaker and date for the latest event for each language.

>>> Meetup.objects.create(language='python', speaker='mike')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='python', speaker='ryan')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='noah')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='shawn')
<Meetup: Meetup object>
>>> Meetup.objects.values("language").annotate(latest_date=models.Max("date")).values("language", "speaker", "latest_date")
[
    {'speaker': u'mike', 'language': u'python', 'latest_date': ...}, 
    {'speaker': u'ryan', 'language': u'python', 'latest_date': ...}, 
    {'speaker': u'noah', 'language': u'node', 'latest_date': ...}, 
    {'speaker': u'shawn', 'language': u'node', 'latest_date': ...}, 
]

D'oh! We're getting the latest event, but for the wrong grouping!

It seems like I need a way to GROUP BY the language but SELECT on a different set of fields?


Update - this sort of query seems fairly easy to express in SQL:

SELECT language, speaker, MAX(date)
FROM app_meetup
GROUP BY language;

I'd love a way to do this without using Django's raw() - is it possible?

Update 2 - after much searching, it seems there are similar questions on SO:

Update 3 - in the end, with @danihp's help, it seems the best you can do is two queries. I've used the following approach:

# Abuse the fact that the latest Meetup always has a higher PK to build
# a ValuesList of the latest Meetups grouped by "language".
latest_meetup_pks = (Meetup.objects.values("language")
                                   .annotate(latest_pk=Max("pk"))
                                   .values_list("latest_pk", flat=True))

# Use a second query to grab those latest Meetups!
Meetup.objects.filter(pk__in=latest_meetup_pks)

This question is a follow up to my previous question:

Django ORM - Get latest record for group

Community
  • 1
  • 1
jb.
  • 9,987
  • 12
  • 39
  • 38
  • Bummer that this is MySQL. In postgres, you can directly use DISTINCT ON to get the latest by group [shameless plug for my answer on another question](http://stackoverflow.com/a/20129229/1309332). – dbn Jun 27 '14 at 22:13

1 Answers1

1

This is the kind of queries that are easy to explain but hard to write. If this be SQL I will suggest to you a CTE filtered query with row rank over partition by language ordered by date ( desc )

But this is not SQL, this is django query api. Easy way is to do a query for each language:

languages = Meetup.objects.values("language", flat = True).distinct.order_by()
last_by_language = [  Meetup
                     .objects
                     .filter( language = l )
                     .latest( 'date' )
                     for l in languages
                    ]

This crash if some language don't has meetings. The other approach is to get all max data for each language:

last_dates = ( Meetup
             .objects
             .values("language")
             .annotate(ldate=models.Max("date"))
             .order_by() )

q= reduce(lambda q,meetup: 
     q | ( Q( language = meetup["language"] ) & Q( date = meetup["ldate"] ) ), 
     last_dates, Q())  

your_query = Meetup.objects.filter(q)

Perhaps someone can explain how to do it in a single query without raw sql.

Edited due OP comment

You are looking for:

"SELECT language, speaker, MAX(date) FROM app_meetup GROUP BY language"

Not all rdbms supports this expression, because all fields that are not enclosed into aggregated functions on select clause should appear on group by clause. In your case, speaker is on select clause (without aggregated function) but not appear in group by.

In mysql they are not guaranties than showed result speaker was that match with max date. Because this, we are not facing a easy query.

Quoting MySQL docs:

In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause...However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group.

The most close query to match your requirements is:

Reults = (   Meetup
             .objects
             .values("language","speaker")
             .annotate(ldate=models.Max("date"))
             .order_by() )
dani herrera
  • 48,760
  • 8
  • 117
  • 177
  • 1
    Using list comprehension with a queryset will generate a database hit for each language. I think your second example is the best way without using raw sql, but you need to join the `Q` objects with `&` instead of `^`. – knbk Jul 26 '13 at 19:25
  • @knbk, thanks about your comments. Also, thanks to fix `and` error. I have wrote from my mind ... not tested. Also, for few languages (3 or 4) first approach is also valid, you agree? – dani herrera Jul 26 '13 at 20:24
  • Yes, the first approach is also valid for few languages, but even with just 2 distinct languages, you'll generate more queries than with the second method. – knbk Jul 27 '13 at 06:38
  • I appreciate the answer, but I'm looking to do this using a single query! It seems? trivially easy to express the query in SQL: SELECT language, speaker, MAX(date) FROM app_meetup GROUP BY language; I feel like there should be some way to make it happen in Django without resorting to `.raw()` :| – jb. Jul 28 '13 at 01:43
  • @danihp well that sucks :) I'll extend my question with additional information I found - at this point, I think the answer is "Depending on your DB, you probably can't." – jb. Jul 28 '13 at 18:17