0

I understand why indexes are important. I understand why order matters when you query using only a few columns from the multi-column index. What I don't understand is why there seems to be a rule of thumb to order from greatest to least cardinality/selectivity. I don't see where any performance gains would stem from, and I've found conflicting results.

How important is the order of columns in indexes?

In the above link, the top answer genuinely does not seem to make sense. I would think it takes just as many steps to find the right combination no matter which order you perform your binary search operations.

Suppose I have the following table:

A   B
--  --
1   a
2   a
3   a
4   a
5   b
6   b
7   b
8   b

Let's say I'm looking for (1,a). If I perform binary search on column A first, I would need three steps to get '1' and then 'a' is a given. If I perform binary search on column B first, I would need 1 step to narrow my results down to 'a' and then 2 more steps to get '1' (binary search on numbers 1-4). Either way, I am performing the same number of steps (3 total), right?

That's besides the fact that the last 3 answers there say order does NOT matter when all columns are provided in the search query.

Oracle: does the column order matter in an index?

In the above link, the top answer states the column with the LEAST distinct values should be put first. In fact, that makes more sense to me since there seem to actually be performance gains due to skip reads. (And if someone could Eli5 index compression to me, that would be great.)

However, the last answer linked to Oracle's docs which explicitly state that ordering from most selective to least selective best improves query performance.

And this is all assuming there is what I'll refer to as a layer-by-layer binary search on the different columns in the index (which multiple sources I've read say does not happen).

So what am I to believe?

Azianese
  • 554
  • 7
  • 21
  • 2
    You are correct that the search density is the same regardless of column order, but only when both predicates are equality predicates. For a predicate like (X=5 and Y>4), an index on Y,X will be significantly less efficient than X,Y. – SQLRaptor Jul 11 '19 at 03:21
  • That makes sense. I hadn't thought of that. Thanks! – Azianese Jul 11 '19 at 17:57
  • There's more to the subject, and a lot of it is implementation specific. For example, SQL Server collects detailed histograms on the leading column only. All multi-column estimations are based on the density vector. I'm sure Oracle has its own quirks, and even different versions will behave differently. In most cases the application is the one that will determine which data domains are most frequently queried for using equality predicates, which data domains will be frequently ordered by / grouped by / range filtered by etc. – SQLRaptor Jul 11 '19 at 18:50

0 Answers0