Before measuring performance and comparing explains
from different queries it usually worth to try and mimic your production setup first or you may (and almost certainly will) get a misleading EXPLAIN path (e.g. seq scan
is used instead of index scan
on small tables as sequential will be faster than random IO in such case)
I tried to tackle it in this way:
First - generate 200K conversations within the past 30 days
insert into conversations(id, name, created_at)
select
generate_series,
'CONVERSATION_'||generate_series,
NOW() - (random() * (interval '30 days'))
from generate_series(1, 200000);
Second - generate 2M messages randomly distributed among 200K conversations
and then also intentionally created 5 more "most recent" messages for conversation with ID=999, so that the conversation 999 has to always appear on top of query result.
insert into messages(id, content, conversation_id, created_at)
select msg.id, content, conversation_id, created_at + (random() * (interval '7 days')) from (
select distinct
generate_series as id,
'Message content ' || generate_series as content,
1 + trunc(random() * 200000) as conversation_id
from generate_series(1, 2000000)
) msg
join conversations c on c.id = conversation_id;
insert into messages(id, content, conversation_id, created_at)
select
generate_series as id,
'Message content ' || generate_series as content,
999 as conversation_id,
now() + interval '7 day' + (random() * (interval '7 days'))
from generate_series(2000001, 2000006);
And now you can try and compare (now with a little bit more confidence) those EXPLAINs to see which query works better.
Assuming you added the proposed index
CREATE INDEX idx1 ON messages(conversation_id, created_at desc)
- Both @GoonerForLife and @asinkxcoswt answers are pretty good, though the result is moderate because of window function usage
with cost=250000 on average and 2 to 3 seconds execution time on my machine
- @SalmanA and @ESG answers are twice as fast even though the
lateral join
will force query planner to choose sequential scan (this is inevitable as the join
is on TRUE
, so no index could be used)
with cost ~150000 on average and 1-1.5 seconds execution time on my machine
It may not be obvious at first, but the @ElapsedSoul's answers with NOT EXISTS is quite close to ideal (though it still needs a couple of tweaks):
(1) It lacks the order by and limit:
select
conversations.id,
m.content AS last_message_content,
m.created_at AS last_message_at
from conversations,messages m
where conversations.id = m.conversation_id and not exists (
select 1 from messages n
where m.conversation_id = n.conversation_id and m.created_at < n.created_at
) order by last_message_at desc
limit 15
And (2) Since there is date-to-date comparison inside the NOT EXISTS subquery - we have to add another index on massages table
CREATE INDEX ix2 ON messages(created_at desc);
After that we should get a decent performance gain.
For example on my machine it resulted in 0.036ms
execution time and 20.07 cost