Why does Postgres not use better index for my query?

Question

I have a table where I keep record of who is following whom on a Twitter-like application:

\d follow
                               Table "public.follow" .
 Column   |           Type           |                      Modifiers
 ---------+--------------------------+-----------------------------------------------------
xid       | text                     |
followee  | integer                  |
follower  | integer                  |
id        | integer                  | not null default nextval('follow_id_seq'::regclass)
createdAt | timestamp with time zone |
updatedAt | timestamp with time zone |
source    | text                     |
Indexes:
  "follow_pkey" PRIMARY KEY, btree (id)
  "follow_uniq_users" UNIQUE CONSTRAINT, btree (follower, followee)
  "follow_createdat_idx" btree ("createdAt")
  "follow_followee_idx" btree (followee)
  "follow_follower_idx" btree (follower)

Number of entries in table is more than a million and when I run explain analyze on the query I get this:

explain analyze SELECT "follow"."follower"
FROM "public"."follow" AS "follow"
WHERE "follow"."followee" = 6
ORDER BY "follow"."createdAt" DESC
LIMIT 15 OFFSET 0;
                                                                  QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.43..353.69 rows=15 width=12) (actual time=5.456..21.497 
rows=15 loops=1)
->  Index Scan Backward using follow_createdat_idx on follow  (cost=0.43..61585.45 rows=2615 width=12) (actual time=5.455..21.488 rows=15 loops=1)
     Filter: (followee = 6)
     Rows Removed by Filter: 62368
Planning time: 0.068 ms
Execution time: 21.516 ms

Why it is doing backward index scan on follow_createdat_idx where it could have been more faster execution if it had used follow_followee_idx.

This query is taking around 33 ms when running first time and then subsequent calls are taking around 22 ms which I feel are on higher side.

I am using Postgres 9.5 provided by Amazon RDS. Any idea what wrong could be happening here?

Because if it did a lookup on the followee index it would then have to do a sort. If this is the primary use of the follow_followee index you might want to try adding createdAt as a second field on that index. — user1937198, May 07 '17 at 16:13
@user1937198 After I did this, computation time went down from 20 ms to 2ms. So it worked. Any performance implications If I leave both `"follow_createdat_idx" btree ("createdAt")` index and newly created `"follow_follower_createdat_idx" btree (follower, "createdAt")` index. Because in some use cases I need to get just all users a person is following where first index may be more optimal. — Abhishek Sugandhi, May 07 '17 at 21:11

score 0 · Accepted Answer · edited May 23 '17 at 12:26

The multicolumn index on (follower, "createdAt") that user1937198 suggested is perfect for the query - as you found in your test already.

Since "createdAt" can be NULL (not defined NOT NULL), you may want to add NULLS LAST to query and index:

...
ORDER BY "follow"."createdAt" DESC NULLS LAST

And:

"follow_follower_createdat_idx" btree (follower, "createdAt" DESC NULLS LAST)

More:

PostgreSQL sort by datetime asc, null first?

There are minor other performance implications:

The multicolumn index on (follower, "createdAt") is 8 bytes per row bigger than the simple index on (follower) - 44 bytes vs 36. More (btree indexes have mostly the same page layout as tables):
- Making sense of Postgres row sizes
Columns involved in an index in any way cannot be changed with a HOT update. Adding more columns to an index might block this optimization - which seems particularly unlikely given the column name. And since you have another index on just ("createdAt") that's not an issue anyway. More:
- PostgreSQL Initial Database Size
There is no downside in having another index on just ("createdAt") (other than the maintenance cost for each (for write performance, not for read performance). Both indexes support different queries. You may or may not need the index on just ("createdAt") additionally. Detailed explanation:
- Is a composite index also good for queries on the first field?

Why does Postgres not use better index for my query?

1 Answers1