84

I have two models A and B. All B objects have a foreign key to an A object. Given a set of A objects, is there anyway to use the ORM to get a set of B objects containing the most recent object created for each A object.

Here's an simplified example:

class Bakery(models.Model):
    town = models.CharField(max_length=255)

class Cake(models.Model):
    bakery = models.ForeignKey(Bakery, on_delete=models.CASCADE)
    baked_at = models.DateTimeField()

So I'm looking for a query that returns the most recent cake baked in each bakery in Anytown, USA.

funnydman
  • 9,083
  • 4
  • 40
  • 55
Zach
  • 18,594
  • 18
  • 59
  • 68

7 Answers7

41

Starting from Django 1.11 and thanks to Subquery and OuterRef, we can finally build a latest-per-group query using the ORM.

hottest_cakes = Cake.objects.filter(
    baked_at=Subquery(
        (Cake.objects
            .filter(bakery=OuterRef('bakery'))
            .values('bakery')
            .annotate(last_bake=Max('baked_at'))
            .values('last_bake')[:1]
        )
    )
)

#BONUS, we can now use this for prefetch_related()
bakeries = Bakery.objects.all().prefetch_related(
    Prefetch('cake_set',
        queryset=hottest_cakes,
        to_attr='hottest_cakes'
    )
)

#usage
for bakery in bakeries:
    print 'Bakery %s has %s hottest_cakes' % (bakery, len(bakery.hottest_cakes))
Dan Swain
  • 2,910
  • 1
  • 16
  • 36
Todor
  • 15,307
  • 5
  • 55
  • 62
  • This worked perfectly, though my use-case was slightly different. What I like about this approach is 1) it keeps the resulting queryset in the target model instance, and 2) it doesn't exclude model instances that don't have related data (in the context of the question, bakeries that haven't baked anything yet). – Supra621 Sep 07 '19 at 16:12
  • 1
    you are smartest guy – yang zhou Apr 01 '20 at 02:31
  • So this gives all the hotests Cakes as a QuerySet. Very interesting. I also needed this, but then needed to filter on them in the bakery, based on an aggregate ('cake__topping'). It seems to work fine. – gabn88 Mar 08 '21 at 14:12
  • If you want multiple fields at once, see the answer here (Django 3.2+): https://stackoverflow.com/a/73088863/6063532 – suayip uzulmez Jul 23 '22 at 07:40
  • @suayipuzulmez this looks interesting, i will definitely give it a try :) – Todor Jul 29 '22 at 05:22
39

As far as I know, there is no one-step way of doing this in Django ORM, but you can split it into two queries:

from django.db.models import Max

bakeries = Bakery.objects.annotate(
    hottest_cake_baked_at=Max('cake__baked_at')
) 
hottest_cakes = Cake.objects.filter(
    baked_at__in=[b.hottest_cake_baked_at for b in bakeries]
)

If id's of cakes are progressing along with bake_at timestamps, you can simplify and disambiguate the above code (in case two cakes arrives at the same time you can get both of them):

from django.db.models import Max

hottest_cake_ids = Bakery.objects.annotate(
    hottest_cake_id=Max('cake__id')
).values_list('hottest_cak‌​e_id', flat=True)

hottest_cakes = Cake.objects.filter(id__in=hottest_cake_ids)

BTW credits for this goes to Daniel Roseman, who once answered similar question of mine:

http://groups.google.pl/group/django-users/browse_thread/thread/3b3cd4cbad478d34/3e4c87f336696054?hl=pl&q=

If the above method is too slow, then I know also second method - you can write custom SQL producing only those Cakes, that are hottest in relevant Bakeries, define it as database VIEW, and then write unmanaged Django model for it. It's also mentioned in the above django-users thread. Direct link to the original concept is here:

http://web.archive.org/web/20130203180037/http://wolfram.kriesing.de/blog/index.php/2007/django-nice-and-critical-article#comment-48425

Hope this helps.

Dan Swain
  • 2,910
  • 1
  • 16
  • 36
Tomasz Zieliński
  • 16,136
  • 7
  • 59
  • 83
  • I'll probably go with the second set of queries you suggested. Thanks. – Zach Jan 18 '10 at 16:08
  • This would be more efficient is you used a value_list for the first query:hottest_cake_ids = Bakery.objects.annotate(hottest_cake_id=Max('cake__id')).values_list('hottest_cake_id', flat=True); hottest_cakes = Cake.objects.filter(id__in=hottest_cake_ids) – dbn Jun 27 '14 at 22:19
  • Also, if you happen to be using PostGreSQL, there is a one step solution. – dbn Jan 30 '15 at 12:34
  • 2
    Doesn't the first solution create an issue where the the latest date for one is before the latest date of another, but exists in another? A = [1, 2, 3], B = [1, 2]. A latest = 3, B latest = 2. The first query seems to get A's 2 and 3, as well as B's 2. – kaungst Mar 27 '15 at 19:41
  • 1
    Starting from `Django 1.11` now there is a one-way step of doing it. Check my answer. – Todor May 11 '17 at 22:17
24

If you happen to be using PostGreSQL, you can use Django's interface to DISTINCT ON:

recent_cakes = Cake.objects.order_by('bakery__id', '-baked_at').distinct('bakery__id')

As the docs say, you must order by the same fields that you distinct on. As Simon pointed out below, if you want to do additional sorting, you'll have to do it in Python-space.

dbn
  • 13,144
  • 3
  • 60
  • 86
  • Love the approach - thanks. Just did a minor fix concerning the final ordering. Depending on the total size of the QS, this may be better or worse than the accepted answer. In my case: better :) – Simon Steinberger Jan 24 '15 at 11:48
  • I think that is an unnecessary complication to the code, and goes beyond what is answered. I'm going to assume that folks can figure out how to sort the resulting data. – dbn Jan 30 '15 at 12:29
  • I played a lot with similar problem, trying `Max` annotations and filtering on them, but they were finally failing on db-side because of improper sql after django optimizer removes order_by (when using result as filter subquery or when aggregating, ex `.count()`). This solution doesn't break all the things when fetching `recent_cakes.count()` and don't throw errors when doing `Cake.objects.filter(pk__in=recent_cackes).filter(other_conditions)`,but the latest example return _random_ cakes per bakery that satisfy other_conditions (not hottest!), because django removes `order_by` from subquery :( – Ivan Klass Apr 01 '15 at 04:48
  • yeah, for that reason, I think that if you aren't using postGreSQL, Tomasz Zielinski's answer is the way to go. – dbn Apr 03 '15 at 23:15
  • Doesn't this order primarily by the bakery_id order, not the baked at date, which throws things off? – Chris Barry Jan 14 '21 at 22:59
5

This should do the job:

from django.db.models import Max
Bakery.objects.annotate(Max('cake__baked_at'))
Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
  • 6
    I haven't tested yet, but that looks like it will annotate the time that each bakery most recently baked a cake. I'm looking for for the actual cake objects. Am I misinterpreting your answer? – Zach Jan 15 '10 at 21:11
  • Yes, you're right. I had forgotten the previous answer I posted for Tomasz :-) – Daniel Roseman Jan 16 '10 at 10:48
  • 1
    I believe this will only work if sorting of cakes by ids and by date result the same ordering. In a generic case where the primary key sequence does not correspond to the chronological order as defined by the date field, this will not work. – Dmitry B. May 02 '11 at 23:46
4

I was fighting with similar problem and finally come to following solution. It does not rely on order_by and distinct so can be sorted as desired on db-side and also can be used as nested query for filtering. I also believe this implementation is db engine independent, because it's based on standard sql HAVING clause. The only drawback is that it will return multiple hottest cakes per bakery, if they are baked in that bakery at exactly same time.

from django.db.models import Max, F

Cake.objects.annotate(
    # annotate with MAX "baked_at" over all cakes in bakery
    latest_baketime_in_bakery=Max('bakery__cake_set__baked_at')
    # compare this cake "baked_at" with annotated latest in bakery
).filter(latest_baketime_in_bakery__eq=F('baked_at'))
Ivan Klass
  • 6,407
  • 3
  • 30
  • 28
0
Cake.objects.filter(bakery__town="Anytown").order_by("-created_at")[:1]

I haven't built out the models on my end, but in theory this should work. Broken down:

  • Cake.objects.filter(bakery__town="Anytown") Should return any cakes whom belong to "Anytown", assuming the country is not part of the string. The double underscores between bakery and town allow us to access the town property of bakery.
  • .order_by("-created_at") will order the results by their created date, most recent first (take note of the - (minus) sign in "-created_at". Without the minus sign, they'd be ordered by oldest to most recent.
  • [:1] on the end will return only the 1st item in the list which is returned (which would be a list of cakes from Anytown, sorted by most recent first).

Note: This answer is for Django 1.11. This answer modified from Queries shown here in Django 1.11 Docs.

twknab
  • 1,741
  • 1
  • 21
  • 35
0

@Tomasz Zieliński solution above did solve your problem but it did not solve mine, because I still need to filter the Cake. So here is my solution

from django.db.models import Q, Max

hottest_yellow_round_cake = Max('cake__baked_at', filter=Q(cake__color='yellow', cake__shape='round'))

bakeries = Bakery.objects.filter(town='Chicago').annotate(
    hottest_cake_baked_at=hottest_yellow_round_cake
)

hottest_cakes = Cake.objects.filter(
    baked_at__in=[b.hottest_cake_baked_at for b in bakeries]
)

With this approach, you can also implement other things like Filter, Ordering, Pagination for Cakes

Nguyen Anh Vu
  • 121
  • 1
  • 2
  • 4