This solution is optimized for memory requirements, as you expect it important. It needs three queries. The first query asks for posts, the second query only for tuples (id, post_id). The third for details of filtered latest comments.
from itertools import groupby, islice
posts = Post.objects.filter(...some your flter...)
# sorted by date or by id
all_comments = (Comment.objects.filter(post__in=posts).values('post_id')
.order_by('post_id', '-pk'))
last_comments = []
# the queryset is evaluated now. Only about 100 itens chunks are in memory at
# once during iterations.
for post_id, related_comments in groupby(all_comments(), lambda x: x.post_id):
last_comments.extend(islice(related_comments, 2))
results = {}
for comment in Comment.objects.filter(pk__in=last_comments):
results.setdefault(comment.post_id, []).append(comment)
# output
for post in posts:
print post.title, [x.comment for x in results[post.id]]
But I think it will be faster for many database backends to combine the second and the third query into one and so to ask immediately for all fields of comments. Unuseful comments will be forgotten immediately.
The fastest solution would be with nested queries. The algorithm is like the one above, but everything is realized by raw SQL. It is limited only to some backends like PostgresQL.
EDIT
I agree that is not useful for you
... prefetch loads into memory thousands of comments, 99% of which will not be shown.
and therefore I wrote that relatively complicated solution that 99% of them will be read continuously without loading into memory.
EDIT
- All examples are for the condition that you wand post_id in [1, 3, 5] (enything selected earlier by categories etc.)
- In all cases create the index for Comments on fields ['post', 'pk']
A) Nested query for PostgresQL
SELECT post_id, id, text FROM
(SELECT post_id, id, text, rank() OVER (PARTITION BY post_id ORDER BY id DESC)
FROM app_comment WHERE post_id in (1, 3, 5)) sub
WHERE rank <= 2
ORDER BY post_id, id
Or explicitely require with less memory if we don't believe the optimizer. It should read data only from index in two inner selects, which is much less data than from the table.:
SELECT post_id, id, text FROM app_comment WHERE id IN
(SELECT id FROM
(SELECT id, rank() OVER (PARTITION BY post_id ORDER BY id DESC)
FROM app_comment WHERE post_id in (1, 3, 5)) sub
WHERE rank <= 2)
ORDER BY post_id, id
B) With a cached ID of the oldest displayed comment
Add field "oldest_displayed" to Post
class Post(models.Model):
oldest_displayed = models.IntegerField()
Filter comments for pk if interesting posts (that you have selected earlier by categories etc.)
Filter
from django.db.models import F
qs = Comment.objects.filter(
post__pk__in=[1, 3, 5],
post__oldest_displayed__lte=F('pk')
).order_by('post_id', 'pk')
pprint.pprint([(x.post_id, x.pk) for x in qs])
Hmm, very nice ... and how it is compiled by Django?
>>> print(qs.query.get_compiler('default').as_sql()[0]) # added white space
SELECT "app_comment"."id", "app_comment"."text", "app_comment"."post_id"
FROM "app_comment"
INNER JOIN "app_post" ON ( "app_comment"."post_id" = "app_post"."id" )
WHERE ("app_comment"."post_id" IN (%s, %s, %s)
AND "app_post"."oldest_displayed" <= ("app_comment"."id"))
ORDER BY app_comment"."post_id" ASC, "app_comment"."id" ASC
Prepare all "oldest_displayed" by one nested SQL initially (and set zero for posts with less than two comments):
UPDATE app_post SET oldest_displayed = 0
UPDATE app_post SET oldest_displayed = qq.id FROM
(SELECT post_id, id FROM
(SELECT post_id, id, rank() OVER (PARTITION BY post_id ORDER BY id DESC)
FROM app_comment ) sub
WHERE rank = 2) qq
WHERE qq.post_id = app_post.id;