Postgres selecting sub-optimal query plan in production

Question

I've got a query with an ORDER and a LIMIT to support a paginated interface:

SELECT segment_members.id AS t0_r0,
       segment_members.segment_id AS t0_r1,
       segment_members.account_id AS t0_r2,
       segment_members.score AS t0_r3,
       segment_members.created_at AS t0_r4,
       segment_members.updated_at AS t0_r5,
       segment_members.posts_count AS t0_r6,
       accounts.id AS t1_r0,
       accounts.platform AS t1_r1,
       accounts.username AS t1_r2,
       accounts.created_at AS t1_r3,
       accounts.updated_at AS t1_r4,
       accounts.remote_id AS t1_r5,
       accounts.name AS t1_r6,
       accounts.language AS t1_r7,
       accounts.description AS t1_r8,
       accounts.timezone AS t1_r9,
       accounts.profile_image_url AS t1_r10,
       accounts.post_count AS t1_r11,
       accounts.follower_count AS t1_r12,
       accounts.following_count AS t1_r13,
       accounts.uri AS t1_r14,
       accounts.location AS t1_r15,
       accounts.favorite_count AS t1_r16,
       accounts.raw AS t1_r17,
       accounts.followers_completed_at AS t1_r18,
       accounts.followings_completed_at AS t1_r19,
       accounts.followers_started_at AS t1_r20,
       accounts.followings_started_at AS t1_r21,
       accounts.profile_fetched_at AS t1_r22,
       accounts.managed_source_id AS t1_r23
FROM segment_members
INNER JOIN accounts ON accounts.id = segment_members.account_id
WHERE segment_members.segment_id = 1
ORDER BY accounts.follower_count ASC LIMIT 20
OFFSET 0;

Here are the indexes on the tables:

accounts
"accounts_pkey" PRIMARY KEY, btree (id)
"index_accounts_on_remote_id_and_platform" UNIQUE, btree (remote_id, platform)
"index_accounts_on_description" btree (description)
"index_accounts_on_favorite_count" btree (favorite_count)
"index_accounts_on_follower_count" btree (follower_count)
"index_accounts_on_following_count" btree (following_count)
"index_accounts_on_lower_username_and_platform" btree (lower(username::text), platform)
"index_accounts_on_post_count" btree (post_count)
"index_accounts_on_profile_fetched_at_and_platform" btree (profile_fetched_at, platform)
"index_accounts_on_username" btree (username)

segment_members
"segment_members_pkey" PRIMARY KEY, btree (id)
"index_segment_members_on_segment_id_and_account_id" UNIQUE, btree (segment_id, account_id)
"index_segment_members_on_account_id" btree (account_id)
"index_segment_members_on_segment_id" btree (segment_id)

In my development and staging databases, the query plan looks like the following, and the query executes very quickly.

 Limit  (cost=4802.15..4802.20 rows=20 width=2086)
   ->  Sort  (cost=4802.15..4803.20 rows=421 width=2086)
         Sort Key: accounts.follower_count
         ->  Nested Loop  (cost=20.12..4790.95 rows=421 width=2086)
               ->  Bitmap Heap Scan on segment_members  (cost=19.69..1244.24 rows=421 width=38)
                     Recheck Cond: (segment_id = 1)
                     ->  Bitmap Index Scan on index_segment_members_on_segment_id_and_account_id  (cost=0.00..19.58 rows=
421 width=0)
                           Index Cond: (segment_id = 1)
               ->  Index Scan using accounts_pkey on accounts  (cost=0.43..8.41 rows=1 width=2048)
                     Index Cond: (id = segment_members.account_id)

In production, however, the query plan is the following, and the query takes forever (several minutes until it hits the statement timeout).

 Limit  (cost=0.86..25120.72 rows=20 width=2130)
   ->  Nested Loop  (cost=0.86..4614518.64 rows=3674 width=2130)
         ->  Index Scan using index_accounts_on_follower_count on accounts  (cost=0.43..2779897.53 rows=3434917 width=209
2)
         ->  Index Scan using index_segment_members_on_segment_id_and_account_id on segment_members  (cost=0.43..0.52 row
s=1 width=38)
               Index Cond: ((segment_id = 1) AND (account_id = accounts.id))

accounts has about 6m rows in staging and 3m in production. segment_members has about 300k rows in staging and 4m in production. Is it the differences in table sizes that is causing the differences in the query plan selection? Is there any way I can get Postgres to use the faster query plan in production?

Update: Here's the EXPLAIN ANALYZE from the slow production server:

 Limit  (cost=0.86..22525.66 rows=20 width=2127) (actual time=173.148..187568.247 rows=20 loops=1)
   ->  Nested Loop  (cost=0.86..4654749.92 rows=4133 width=2127) (actual time=173.141..187568.193 rows=20 loops=1)
         ->  Index Scan using index_accounts_on_follower_count on accounts  (cost=0.43..2839731.81 rows=3390197 width=2089) (actual time=0.110..180374.279 rows=1401278 loops=1)
         ->  Index Scan using index_segment_members_on_segment_id_and_account_id on segment_members  (cost=0.43..0.53 rows=1 width=38) (actual time=0.003..0.003 rows=0 loops=1401278)
               Index Cond: ((segment_id = 1) AND (account_id = accounts.id))
 Total runtime: 187568.318 ms
(6 rows)

Please add the output of `EXPLAIN **analyze**` to your question. (and use table aliases, please ...) — wildplasser, Sep 17 '14 at 22:57

score 3 · Answer 1 · edited Jun 20 '20 at 09:12

Either your table statistics are not up to date or the two queries you present are very different. The second one estimates to retrieve 3.5M rows (rows=3434917). ORDER BY / LIMIT 20 is forced to sort all 3.5 million rows to find the top 20, which is going to be extremely expensive - unless you have a matching index.
The first query plan expects to sort 421 rows. Not even close. Different query plans are no surprise.
It would be interesting to see the output of EXPLAIN ANALYZE, not just EXPLAIN. (Expensive for the second query!)

It very much depends on how many account_id for each segment_id. If segment_id is not selective, the query cannot be fast. Your only other option is a MATERIALIZED VIEW with the top n rows per segment_id and an appropriate regime to keep it up to date.

If your statistics are not up to date, just run ANALYZE on both tables and retry.
It might help to increase the statistics target for selected columns:

ALTER TABLE segment_members ALTER segment_id SET STATISTICS 1000;
ALTER TABLE segment_members ALTER account_id SET STATISTICS 1000;

ALTER TABLE accounts ALTER id             SET STATISTICS 1000;
ALTER TABLE accounts ALTER follower_count SET STATISTICS 1000;

ANALYZE segment_members(segment_id, account_id);
ANALYZE accounts (id, follower_count);

Details:

Better indexes

I addition to your existing UNIQUE constraint index_segment_members_on_segment_id_and_account_id on segment_members, I suggest a multicolumn index on accounts:

CREATE INDEX index_accounts_on_follower_count ON accounts (id, follower_count)

Again, run ANALYZE after creating the index.

Some indexes useless?

All other indexes in your question are irrelevant for this query. They may be useful for other purposes or useless.

This index is 100% dead freight, drop it. (Detailed explanation here.)

"index_segment_members_on_segment_id" btree (segment_id)

This one may be useless:

"index_accounts_on_description" btree (description)

Since a "description" is typically free text that is hardly used to order rows or in a WHERE condition with a suitable operator. But that's just an educated guess.

Thanks for detailed answer, Erwin! There's only one query. I'm running the same query against 3 databases with identical schemas and getting very different query plans. — sbleon, Sep 26 '14 at 15:37
I've also got autovacuum enabled, so the statistics should be up to date. I've even tried REINDEXing the tables, but the bad plan doesn't change on my production database. — sbleon, Sep 26 '14 at 15:38

Postgres selecting sub-optimal query plan in production

1 Answers1

Better indexes

Some indexes useless?