I've be told and read it everywhere (but no one dared to explain why) that when composing an index on multiple columns I should put the most selective column first, for performance reasons. Why is that? Is it a myth?
-
1wow, so many answers to questions I dndn't make – milan Nov 24 '10 at 07:25
4 Answers
You can omit columns from right to left when using an index, i.e. when you have an index on col_a, col_b
you can use it in WHERE col_a = x
but you can not use it in WHERE col_b = x
.
Imagine to have a telephone book that is sorted by the first names and then by the last names.
At least in Europe and US first names have a much lower selectivity than last names, so looking up the first name wouldn't narrow the result set much, so there would still be many pages to check for the correct last name.

- 32,613
- 18
- 106
- 168
-
5+1. You can still use the index if leading columns are missing, but it would be a full index scan (or an index skip scan), which is not all that efficient (could be still better than a full table scan, though). – Thilo Nov 24 '10 at 01:38
-
-
I think at least in Europe and US first names have a much lower selectivity than last names, so an index by first name first wouldn't be of much help. – AndreKR Nov 24 '10 at 01:43
-
AndrewKR, true, but that is dependent on the Index being specified with the most select column left-most. If you add that to your answer I will give you +1. – PerformanceDBA Nov 28 '10 at 07:42
-
@PerformanceDBA I don't quite understand what's your point. Could you elaborate? – AndreKR Nov 28 '10 at 21:01
-
I should put the most selective column first
According to Tom, column selectivity has no performance impact for queries that use all the columns in the index (it does affect Oracle's ability to compress the index).
it is not the first thing, it is not the most important thing. sure, it is something to consider but it is relatively far down there in the grand scheme of things.
In certain strange, very peculiar and abnormal cases (like the above with really utterly skewed data), the selectivity could easily matter HOWEVER, they are
a) pretty rare b) truly dependent on the values used at runtime, as all skewed queries are
so in general, look at the questions you have, try to minimize the indexes you need based on that.
The number of distinct values in a column in a concatenated index is not relevant when considering the position in the index.
However, these considerations should come second when deciding on index column order. More importantly is to ensure that the index can be useful to many queries, so the column order has to reflect the use of those columns (or the lack thereof) in the where clauses of your queries (for the reason illustrated by AndreKR).
HOW YOU USE the index -- that is what is relevant when deciding.
All other things being equal, I would still put the most selective column first. It just feels right...
Update: Another quote from Tom (thanks to milan for finding it).
In Oracle 5 (yes, version 5!), there was an argument for placing the most selective columns first in an index.
Since then, it is not true that putting the most discriminating entries first in the index will make the index smaller or more efficient. It seems like it will, but it will not.
With index key compression, there is a compelling argument to go the other way since it can make the index smaller. However, it should be driven by how you use the index, as previously stated.

- 257,207
- 101
- 511
- 656
-
You have the index compression info as a little bit of a side note, but it shouldn't be ignored. There are a lot of scenarios that compressing an index is a fabulous idea. – Craig Nov 24 '10 at 04:00
-
@Craig: I can see how column ordering would have an impact on index compression, but would that not work the other way around (low-cardinality leading columns resulting in repetitive, compressible prefixes)? – Thilo Nov 24 '10 at 05:05
-
Tom said that for **Oracle 5** http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1296165726968#59899084713981 – milan Nov 24 '10 at 07:28
The ordering of the columns in the index should be determined by your queries and not be any selectivity considerations. If you have an index on (a,b,c), and most of your single column queries are against column c, followed by a, then put them in the order of c,a,b in the index definition for the best efficiency. Oracle prefers to use the leading edge of the index for the query, but can use other columns in the index in a less efficient access path known as skip-scan.

- 11,645
- 31
- 34
The more selective is your index, the fastest is the research.
Simply imagine a phonebook: you can find someone mostly fast by lastname. But if you have a lot of people with the same lastname, you will last more time on looking for the person by looking at the firstname everytime.
So you have to give the most selective columns firstly to avoid as much as possible this problem.
Additionally, you should then make sure that your queries are using correctly these "selectivity criterias".

- 5,523
- 9
- 51
- 75
-
+1. That's exactly right. Assuming that has been done, (AndrewKR) columns can be dropped from right to left. – PerformanceDBA Nov 28 '10 at 07:40