Index scan for multicolumn comparison - non-uniform index column ordering

Question

This question is closely related to Enforcing index scan for multicolumn comparison

The solution there is perfect, but seems to works only if all index columns have same ordering. This question is different because column b is desc here, and this fact stops from using row-syntax to solve the same problem. This is why I'm looking for another solution.

Suppose index is built for 3 columns (a asc, b DESC, c asc), I want Postgres to:

find key [a=10, b=20, c=30] in that B-tree,
scan next 10 entries and return them.

If the index has only one column the solution is obvious:

select * from table1 where a >= 10 order by a limit 10

But if there are more columns the solution becomes much more complex. For 3 columns:

select * from table1
where a > 10 or (a = 10 and (b < 20 or b = 20 and c <= 30))
order by a, b DESC, c
limit 10;

How can I tell Postgres that I want this operation?

And can I be sure that even for those complex queries for 2+ columns the optimizer will always understand that he should perform range-scan? Why?

Have you tried when you select a and not *? – Saad Ahmad Apr 10 '19 at 17:29 — Saad Ahmad, Apr 10 '19 at 17:29

The Impaler · Accepted Answer · 2019-04-10T18:56:22.097

2

PostgreSQL implements tuples very thoroughly, (unlike half implementations found in Oracle, DB2, SQL Server, etc.). You can write your condition using "tuples inequality", as in:

select * 
from table1
where (a, -b, c) >= (10, -20, 30)
order by a, -b, c
limit 10

Please note that since the second column is in descending order, you must "invert" its value during the comparison. That's why it's expressed as -b and also, -20. This can be tricky for non-numeric columns such as dates, varchars, LOBs, etc.

Finally, the use of an index is still possible with the -b column value if you create an ad-hoc index, such as:

create index ix1 on table1 (a, (-b), c);

However, you can never force PostgreSQL to use an index. SQL is a declarative language, not an imperative one. You can entice it to do it by keeping table stats up to date, and also by selecting a small number of rows. If your LIMIT is too big, PostgreSQL may be inclined to use a full table scan instead.

edited Apr 10 '19 at 18:56

answered Apr 10 '19 at 17:36

The Impaler

45,731
9
39
76

This is all interesting. But we still cannot use the given index `(a ASC, b DESC, c ASC)` for the query fully. Postgres reads all index tuples with `a >= 10` and ***filters*** the rest on the whole ROW value. An index on just `(a)` typically does a better job - with the notable exception of index-only scans where possible. And only *some* data types have a negator `-` defined. `text`, for instance, has none. – Erwin Brandstetter Apr 10 '19 at 17:47
I agree, I don't see how you can use the index `(a, b DESC, c)`. Extending your second query with equalities and inequalities, is the only solution I see to use this index. But, yes, it gets cumbersome the more columns you add. – The Impaler Apr 10 '19 at 17:57
Your workaround with the expression index on `(a, (-b), c)` can work fully after all - if you adapt `ORDER BY` to match as well: **`order by a, -b, c`** (not `order by a, b desc, c)`. Then we get an index scan without additional `FILTER` step. – Erwin Brandstetter Apr 10 '19 at 18:17
SQL Server doesn't implement tuples at all, as far as I'm concerned. – Gordon Linoff Apr 10 '19 at 19:43
@GordonLinoff Duly noted. – The Impaler Apr 10 '19 at 19:46

Erwin Brandstetter · Answer 2 · 2019-05-06T17:03:05.623

Strictly speaking, your index on (a ASC, b DESC, c ASC) can still be used, but only based on the leading expression a. See:

It's usefulness is limited and Postgres will only use it if the predicate on a alone is selective enough (less than roughly 5% of all rows have a >= 10). (Or possibly to profit from an index-only scans where possible.) But all index tuples qualifying on only a have to be read and you will see a FILTER step in the query plan to discard non-qualifying rows - both adding additional cost. An index on just (a) typically does a better job as it's smaller and cheaper to maintain.

I have tried and failed in the past to make full use of an index with non-uniform sort order (ASC | DESC) like you display for ROW value comparison. I am pretty certain it's not possible. Think of it: Postgres compares whole row values, which can either be greater or smaller, but not both at the same time.

There are workarounds for datatypes with a defined negator (like - for numeric data types). See the solution provided by "The Impaler"! The trick is to invert values and wrap it in an expression index to get uniform sort order for all index expressions after all - which is currently the only way to tap into the full potential of row comparison. Be sure to make both WHERE conditions and ORDER BY match the special index.

Yeah, I was debating on how to alternate `<=` and `>=` on a per column basis for the tuple comparison, but SQL is not expressive enough (yet?) to do so, afaik. +1 — The Impaler, Apr 10 '19 at 19:02

Index scan for multicolumn comparison - non-uniform index column ordering

2 Answers2

Linked