0

The index:

CREATE INDEX message_index ON message(queue_id, target_client_id, timestamp ASC, source_client_id);

The query:

EXPLAIN ANALYZE 
SELECT content 
  FROM message 
WHERE message.queue_id = 1 
  AND message.source_client_id = 1 
  AND (message.target_client_id = -1 OR message.target_client_id = 1);

Output:

 Bitmap Heap Scan on message  (cost=8.87..12.89 rows=1 width=13) (actual time=0.022..0.026 rows=50 loops=1)
   Recheck Cond: (((queue_id = 1) AND (target_client_id = (-1)) AND (source_client_id = 1)) OR ((queue_id = 1) AND (target_client_id = 1) AND (source_client
_id = 1)))
   ->  BitmapOr  (cost=8.87..8.87 rows=1 width=0) (actual time=0.017..0.017 rows=0 loops=1)
         ->  Bitmap Index Scan on message_index  (cost=0.00..4.43 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=1)
               Index Cond: ((queue_id = 1) AND (target_client_id = (-1)) AND (source_client_id = 1))
         ->  Bitmap Index Scan on message_index  (cost=0.00..4.44 rows=1 width=0) (actual time=0.006..0.006 rows=50 loops=1)
               Index Cond: ((queue_id = 1) AND (target_client_id = 1) AND (source_client_id = 1))

How can this query use the index with regards to source_client_id i.e. the rightmost column in the composite index without the third column (timestamp) being involved in the query at all?

According to the last answer here How important is the order of columns in indexes? this should not be valid. What am I missing?

Community
  • 1
  • 1
ben
  • 5,671
  • 4
  • 27
  • 55
  • 1
    It effectively used predicate `timestamp = 'anything'` for the 3-rd column. BTW `timestamp` is a bad name for a table column. It is a reserved word and has no business meaning. – Ihor Romanchenko Oct 22 '14 at 09:37
  • I posted three questions during the last 24 hours. Each contained the column "timestamp" and each time somebody mentions it. SO is great ;) @a_horse_with_no_name if you consider it appropriate you may post your comment as answer since I find it sufficient for this question. – ben Oct 22 '14 at 14:16

1 Answers1

1

Postgres can use other columns than the leading one for an index lookup - this is just not as efficient as using the leftmost column. Postgres will scan the entire index in that case (instead of the table). Whereas for conditions on the left most column(s) Postgres will only retrieve those rows from the index that match the condition. The difference in efficiency is therefore the number of index entries that are processed.

I think this is somehow hidden behind the following sentence from the manual:

Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned.

Where "these columns" refers to the left most columns.


Btw: the reason why timestamp (or date or number) is such a bad choice for a column is that it's a reserved word. But more importantly: the name doesn't document anything. Someone not familiar with the data model has no clue what you are storing there. The time the message was sent? The time the message was received? The time the message was last updated?

  • 1
    "Postgres will scan the entire index". I think I know what you mean by that but I'm not sure whether this statement itself is correct. If one has an index on A,B,C and the query is on A,C then postgres will scan the entire B "level" of the index but for example it does not necessarily scan the complete A level right? – ben Oct 22 '14 at 15:49
  • @ben: Postgres will to scan (i.e. retrieve) **all** blocks of the index. There is no way to retrieve only the value for `C` without retrieving `A` **and** `B` –  Oct 22 '14 at 15:52
  • But suppose that the query contains in the `WHERE` clause `A=3`. Isn't it then so that all other branches other than `A=3` on the A level say for example `A=100` do not have to be traversed downwards (over B and C) since any leaf node of that branch will hold a value which is surely excluded from the query result? – ben Oct 22 '14 at 16:44
  • @ben: yes. If the leftmost index column is used in an equality condition you are right. I thought you were talking about conditions that _only_ use column `C` –  Oct 22 '14 at 16:54