1

I have a model Page, which can have Posts on it. What I want to do is get every Page, plus the most recent Post on that page. If the Page has no Posts, I still want the page. (Sound familiar? This is a LEFT JOIN in SQL).

Here is what I currently have:

Page.objects.annotate(most_recent_post=Max('post__post_time'))

This only gets Pages, but it doesn't get Posts. How can I get the Posts as well?

Models:

class Page(models.Model):
    name = models.CharField(max_length=50)
    created = models.DateTimeField(auto_now_add = True)
    enabled = models.BooleanField(default = True)

class Post(models.Model):
    user = models.ForeignKey(User)
    page = models.ForeignKey(Page)
    post_time = models.DateTimeField(auto_now_add = True)
Ben G
  • 26,091
  • 34
  • 103
  • 170

3 Answers3

2

Depending on the relationship between the two, you should be able to follow the relationships quite easily, and increase performance by using select_related

Taking this:

class Page(models.Model):
    ...

class Post(models.Model):
    page = ForeignKey(Page, ...)

You can follow the forward relationship (i.e. get all the posts and their associated pages) efficiently using select_related:

Post.objects.select_related('page').all()

This will result in only one (larger) query where all the page objects are prefetched.

In the reverse situation (like you have) where you want to get all pages and their associated posts, select_related won't work. See this,this and this question for more information about what you can do.

Community
  • 1
  • 1
Timmy O'Mahony
  • 53,000
  • 18
  • 155
  • 177
  • I'm trying to get only the most recent post. `page.post_set.all` seems to have all of them rather than just that one. how do i get the desired one? – Ben G Feb 16 '12 at 06:49
  • `page.post_set.all().latest()` in the view or `page.post_set.all.latest` in the template. You need to specify a `get_latest_by` field in the `Meta` class of your model: https://docs.djangoproject.com/en/dev/ref/models/querysets/#latest – Timmy O'Mahony Feb 16 '12 at 11:02
  • if you're using that method, what's the point of even getting the `most_recent_post=Max('post__post_time')`? You could get it from `post_set.all.latest.post_time`, no? – Ben G Feb 16 '12 at 19:48
  • Okay, so the problem is it queries every time you run `.all()` or `.latest()` http://pastebin.com/5SKGcSPS – Ben G Feb 16 '12 at 20:20
  • ok, I've updated my answer to clear up a few things that I misunderstood myself – Timmy O'Mahony Feb 16 '12 at 21:01
  • 1
    marking this as correct. I ended up doing a raw SQL query. The various solution for django 1.3 and lower aren't very nice. Can't wait for django 1.4 to introduce `prefetch_related`.. – Ben G Feb 16 '12 at 23:41
1

Probably your best bet is to use the techniques described in the django docs here: Following Links Backward.

After you do:

pages = Page.objects.annotate(most_recent_post=Max('post__post_time'))
posts = [page.post_set.filter(post_time=page.most_recent_post) for page in pages]

And then posts[0] should have the most recent post for pages[0] etc. I don't know if this is the most efficient solution, but this was the solution mentioned in another post about the lack of left joins in django.

  • This works, but if there were a way to do it with fewer queries that would be much appreciated. – Ben G Feb 16 '12 at 06:54
1

You can create a database view that will contain all Page columns alongside with with necessary latest Post columns:

CREATE VIEW `testapp_pagewithrecentpost` AS
    SELECT testapp_page.*, testapp_post.*  -- I suggest as few post columns as possible here
    FROM `testapp_page` LEFT JOIN `testapp_page` 
    ON test_page.id = test_post.page_id 
    AND test_post.post_time = 
        ( SELECT MAX(test_post.post_time) 
          FROM test_post WHERE test_page.id = test_post.page_id );

Then you need to create a model with flag managed = False (so that manage.py sync won't break). You can also use inheritance from abstract Model to avoid column duplication:

class PageWithRecentPost(models.Model):   # Or extend abstract BasePost ?
    # Page columns goes here
    # Post columns goes here
    # We use LEFT JOIN, so all columns from the 
    # 'post' model will need blank=True, null=True 

    class Meta:
        managed = False  # Django will not handle creation/reset automatically

By doing that you can do what you initially wanted, so fetch from both tables in just one query:

pages_with_recent_post = PageWithRecentPost.objects.filter(...)
for page in pages_with_recent_post:
    print page.name        # Page column
    print page.post_time   # Post column

However this approach is not drawback free:

  • It's very DB engine-specific
  • You'll need to add VIEW creation SQL to your project
  • If your models are complex it's very likely that you'll need to resolve table column name clashes.
  • Model based on a database view will very likely be read-only (INSERT/UPDATE will fail).
  • It adds complexity to your project. Allowing for multiple queries is a definitely simpler solution.
  • Changes in Page/Post will require re-creating the view.
Mariusz Jamro
  • 30,615
  • 24
  • 120
  • 162