3

I have a query on a 500 000 row table.

Basically

WHERE s3_.id = 287
ORDER BY m0_.id DESC
LIMIT 25

=> Query runtime = 20ms

WHERE s3_.id = 287
ORDER BY m0_.created_at DESC
LIMIT 25

=> Query runtime = 15000ms or more

There is an index on created_at.

The Query plans are completely different.

Unfortunately I am not a query plan guru. I would like to reproduce the fast query plan while ordering by created_at.

Is that possible and how would I go about that ?

Query Plan - Slow Query (order by m0_.created_at) : http://explain.depesz.com/s/KBl

Query Plan - Fast Query ( order by m0_.id ): http://explain.depesz.com/s/2pYZ

Full Query

SELECT m0_.id AS id0, m0_.content AS content1, m0_.created_at AS created_at2,
c1_.id AS id3, l2_.id AS id4, l2_.reference AS reference5,
s3_.id AS id6, s3_.name AS name7, s3_.code AS code8,
u4_.email AS email9, u4_.id AS id10, u4_.firstname AS firstname11, u4_.lastname AS lastname12,
u5_.email AS email13, u5_.id AS id14, u5_.firstname AS firstname15, u5_.lastname AS lastname16,
g6_.id AS id17, g6_.firstname AS firstname18, g6_.lastname AS lastname19, g6_.email AS email20,
m0_.conversation_id AS conversation_id21, m0_.author_user_id AS author_user_id22, m0_.author_guest_id AS author_guest_id23,
c1_.author_user_id AS author_user_id24, c1_.author_guest_id AS author_guest_id25, c1_.listing_id AS listing_id26,
l2_.poster_id AS poster_id27, l2_.site_id AS site_id28, l2_.building_id AS building_id29, l2_.type_id AS type_id30, l2_.neighborhood_id AS neighborhood_id31, l2_.facility_bathroom_id AS facility_bathroom_id32, l2_.facility_kitchen_id AS facility_kitchen_id33, l2_.facility_heating_id AS facility_heating_id34, l2_.facility_internet_id AS facility_internet_id35, l2_.facility_condition_id AS facility_condition_id36, l2_.original_translation_id AS original_translation_id37, 
u4_.site_id AS site_id38, u4_.address_id AS address_id39, u4_.billing_address_id AS billing_address_id40,
u5_.site_id AS site_id41, u5_.address_id AS address_id42, u5_.billing_address_id AS billing_address_id43,
g6_.site_id AS site_id44
FROM message m0_
INNER JOIN conversation c1_ ON m0_.conversation_id = c1_.id
INNER JOIN listing l2_ ON c1_.listing_id = l2_.id
INNER JOIN Site s3_ ON l2_.site_id = s3_.id
INNER JOIN user_ u4_ ON l2_.poster_id = u4_.id
LEFT JOIN user_ u5_ ON m0_.author_user_id = u5_.id
LEFT JOIN guest_data g6_ ON m0_.author_guest_id = g6_.id
WHERE s3_.id = 287
ORDER BY m0_.created_at DESC
LIMIT 25 OFFSET 0
Baishu
  • 1,448
  • 12
  • 12
  • 1
    When you post an execution plan make sure you prevent the indention, otherwise it's next to impossible to make sense of them. You can also upload the plans to http://explain.depesz.com –  Jul 21 '14 at 19:19
  • Consider clarifying the title. "Changing ORDER BY" doesn't do it justice. In combination with a small `LIMIT`, you actually make it a *completely* different query with a *completely* different result. Also, add proper query plans with actual times from `EXPLAIN ANALYZE`, not just `EXPLAIN`. Run them through explain.depesz.com like advised by @a_horse. And make the description of the problem fit the query: `m.id`? You mean `s3_.id` or `m0_.id`? – Erwin Brandstetter Jul 21 '14 at 21:34
  • @a_horse_with_no_name Great tip thanks, it's uploaded on explain.depesz.com, the links are in the description. – Baishu Jul 22 '14 at 12:42
  • @ErwinBrandstetter Thanks a lot, all suggested changes are made. – Baishu Jul 22 '14 at 12:43
  • @ErwinBrandstetter I have tried to read the Query plans. I get the jist of it but it's still obscure to me as to how to make the planner do what I'd like. This question seems similar to my issue http://stackoverflow.com/questions/18122115 but does not offer a clear solution. For now, I have resorted back to sorting on ID, I'll get back to this when I have more time. – Baishu Jul 28 '14 at 10:27

3 Answers3

3

Fix your query

Your WHERE condition is on a table that's joined via LEFT JOIN nodes. The WHERE condition forces the joins to behave like [INNER] JOIN. That's pointless and may confuse the query planner, especially with a query that has a lot of tables and therefore many possible query plans. By setting that right, you reduce the number of possible query plans drastically, making it easier for Postgres to find a good one.
More details in the answer to the additionally spawned question.

SELECT m0_.id AS id0, ...
FROM   site            s3_
JOIN   listing         l2_ ON l2_.site_id = s3_.id
JOIN   conversation    c1_ ON c1_.listing_id = l2_.id
JOIN   message         m0_ ON m0_.conversation_id = c1_.id

LEFT   JOIN user_      u4_ ON u4_.id = l2_.poster_id
LEFT   JOIN user_      u5_ ON u5_.id = m0_.author_user_id
LEFT   JOIN guest_data g6_ ON g6_.id = m0_.author_guest_id
WHERE  s3_.id = '287'  -- ??
ORDER  BY m0_.created_at DESC
LIMIT  25

Why s3_.id = '287'?

Looks like 287 should be an integer type, that you would typically enter as numeric constant without quotes: 287. What's the actual data type (and why)? Only a minor problem either way.

Reading the query plan

@FuzzyTree already hinted (quite accurately) that sorting on a different column than what's used in your WHERE clause complicates things. But that's not the elephant in the room here.

The combination with LIMIT 25 makes the difference huge. Both query plans show a reduction from rows=124616 to rows=25 in their last step, which is huge.

Both query plans also show: Seq Scan on site s3_ ... rows=1. So if you ORDER BY _s3.id in your fast variant, you are not actually ordering anything. While the other query has to find the top 25 rows out of 124616 candidates ... Hardly a fair comparison.

Solution

After clarification, the problem seems clearer. You are selecting a huge number of rows by one criteria, but ordering by another. No conventional index design can cover this, not even if both columns were to reside in the same table (which they don't).

I think we found a (non-trivial) solution for this class of problems under this related question on dba.SE:

Of course, all the usual advice for query optimization and general performance optimization applies.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • This question may be helpful: http://stackoverflow.com/questions/24876673/explain-join-vs-left-join-and-where-condition-performance-suggestion-in-more-de/24876947#24876947 – Dwayne Towell Jul 22 '14 at 01:33
  • I fixed the query for the INNER JOINs and the integer type. Unfortunately it does not seem to help much on the issue. Indeed I don't ORDER BY _s3.id but by m0_.id. – Baishu Jul 22 '14 at 12:51
  • @Baishu: Sorry I missed your earlier update. I think there is a way to make this faster, but it's not a simple solution. I added a bit to my answer. – Erwin Brandstetter Jul 28 '14 at 11:09
  • @ErwinBrandstetter The fast query is also selecting by one criteria and ordering by another, isn't it ? 'WHERE s3_.id = 287 ORDER BY m0_.id DESC' – Baishu Jul 28 '14 at 12:59
  • @ErwinBrandstetter It turned out to be simply an index issue (NULLS handling). Sorry for the trouble, I completely missed that. – Baishu Aug 02 '14 at 20:04
3

It turned out to be an index issue. The NULLS behaviour of the query was not coherent with the index.

CREATE INDEX message_created_at_idx on message (created_at DESC NULLS LAST);

... ORDER BY message.created_at DESC; -- defaults to NULLS FIRST when DESC

solutions

If you specify NULLS in your index or query, make sure they are coherent with each other.

ie: ASC NULLS LAST is coherent with ASC NULLS LAST or DESC NULLS FIRST.

NULLS LAST

CREATE INDEX message_created_at_idx on message (created_at DESC NULLS LAST);

... ORDER BY messsage.created_at DESC NULLS LAST;

NULLS FIRST

CREATE INDEX message_created_at_idx on message (created_at DESC); -- defaults to NULLS FIRST when DESC

... ORDER BY messsage.created_at DESC -- defaults to NULLS FIRST when DESC;

NOT NULL

If your column is NOT NULL, don't bother with NULLS.

CREATE INDEX message_created_at_idx on message (created_at DESC);

... ORDER BY messsage.created_at DESC;
Baishu
  • 1,448
  • 12
  • 12
0

In your first query your WHERE and ORDER BY are both on id, so it can take advantage of the same index whereas your second query has different columns for your WHERE and ORDER BY.

Try adding a composite index so the same index can be used for your WHERE and ORDER BY

CREATE INDEX myIndex ON message (id,created_at);
FuzzyTree
  • 32,014
  • 3
  • 54
  • 85
  • The WHERE and the ORDER BY do not concern fields belonging to the same table. Unfortunately I don't think I can create a composite index across tables. – Baishu Jul 22 '14 at 12:55