88

I've created composite indexes (indices for you mathematical folk) on tables before with an assumption of how they worked. I was just curious if my assumption is correct or not.

I assume that when you list the order of columns for the index, you are also specifying how the indexes will be grouped. For instance, if you have columns a, b, and c, and you specify the index in that same order a ASC, b ASC, and c ASC then the resultant index will essentially be many indexes for each "group" in a.

Is this correct? If not, what will the resultant index actually look like?

Joe Phillips
  • 49,743
  • 32
  • 103
  • 159
  • See here: [SQL Server covering indexes](http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/sql-server-covering-indexes) for a good explanation – SQLMenace Apr 27 '09 at 20:03
  • This seems like a composite index to me CREATE NONCLUSTERED INDEX idx_PeopleTest_Name_Id_FavoriteColor ON PeopleTest(Name, Id, FavoriteColor) – SQLMenace Apr 27 '09 at 20:08

5 Answers5

113

Composite indexes work just like regular indexes, except they have multi-values keys.

If you define an index on the fields (a,b,c) , the records are sorted first on a, then b, then c.

Example:

| A | B | C |
-------------
| 1 | 2 | 3 |
| 1 | 4 | 2 |
| 1 | 4 | 4 |
| 2 | 3 | 5 |
| 2 | 4 | 4 |
| 2 | 4 | 5 |
Rik
  • 28,507
  • 14
  • 48
  • 67
  • 61
    Please note also that indexes are stored as Btree, so an (a,b,c) index will help on a search on (a) and on (a, b), but *not* on other searches like (b) or (b,c). – geek-merlin Apr 30 '16 at 17:53
44

Composite index is like a plain alphabet index in a dictionary, but covering two or more letters, like this:

AA - page 1
AB - page 12

etc.

Table rows are ordered first by the first column in the index, then by the second one etc.

It's usable when you search by both columns OR by first column. If your index is like this:

AA - page 1
AB - page 12
…
AZ - page 245
BA - page 246
…

you can use it for searching on 2 letters ( = 2 columns in a table), or like a plain index on one letter:

A - page 1
B - page 246
…

Note that in case of a dictionary, the pages themself are alphabetically ordered. That's an example of a CLUSTERED index.

In a plain, non-CLUSTERED index, the references to pages are ordered, like in a history book:

Gaul, Alesia: pages 12, 56, 78
Gaul, Augustodonum Aeduorum: page 145
…
Gaul, Vellaunodunum: page 24
Egypt, Alexandria: pages 56, 194, 213, 234, 267

Composite indexes may also be used when you ORDER BY two or more columns. In this case a DESC clause may come handy.

See this article in my blog about using DESC clause in a composite index:

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
21

The most common implementation of indices uses B-trees to allow somewhat rapid lookups, and also reasonably rapid range scans. It's too much to explain here, but here's the Wikipedia article on B-trees. And you are right, the first column you declare in the create index will be the high order column in the resulting B-tree.

A search on the high order column amounts to a range scan, and a B-tree index can be very useful for such a search. The easiest way to see this is by analogy with the old card catalogs you have in libraries that have not yet converted to on line catalogs.

If you are looking for all the cards for Authors whose last name is "Clemens", you just go to the author catalog, and very quickly find a drawer that says "CLE- CLI" on the front. That's the right drawer. Now you do a kind of informal binary search in that drawer to quickly find all the cards that say "Clemens, Roger", or "Clemens, Samuel" on them.

But suppose you want to find all the cards for the authors whose first name is "Samuel". Now you're up the creek, because those cards are not gathered together in one place in the Author catalog. A similar phenomenon happens with composite indices in a database.

Different DBMSes differ in how clever their optimizer is at detecting index range scans, and accurately estimating their cost. And not all indices are B-trees. You'll have to read the docs for your specific DBMS to get the real info.

Walter Mitty
  • 18,205
  • 2
  • 28
  • 58
  • thanks, I've been thinking about this problem hard, with no clear answer. "A search on the high order column amounts to a range scan,", but what if the index covers 2 columns, and both columns are specified in a range query, like "ColumnA < threshold1 AND columnA > threshold 2 AND columnB < threshold3 AND columnB > threshold4 ", it seems that oracle would have to spend MULTIPLE range scans on the B-tree, right? then in case we have many columns in the composite index, we have to do MANY range scans, and the effectiveness of index becomes greatly reduced – teddy teddy May 15 '12 at 16:52
  • In my answer, I meant that ColumnA = value amounts to a range scan, because there may be many entries that all have the right value for ColumnA but different values for ColumnB. The case you outline is quite different. It might still be a range scan, but the range could involve a large percentage of the entries in the index. The larger the range, the less the index saves you. If the value of using the index drops too low, the optimizer might opt for a different strategy. – Walter Mitty Feb 09 '13 at 02:00
4

No. Resultant index will be single index but with compound key.

KeyX = A,B,C,D; KeyY = 1,2,3,4;

Index KeyX, KeyY will be actually: A1,A2,A3,B1,B3,C3,C4,D2

So that in case you need to find something by KeyX and KeyY - that will be fast and will use single index. Something like SELECT ... WHERE KeyX = "B" AND KeyY = 3.

But it's important to understand: WHERE KeyX = ? requests will use that index, while WHERE KeyY = ? will NOT use such index at all.

Mash
  • 877
  • 6
  • 16
  • The last assertion isn't true on Oracle. See http://stackoverflow.com/questions/57878/sql-oracle-when-indexes-on-multiple-columns-can-be-used (ignore the - incorrect - accepted answer). – Hobo Apr 27 '09 at 20:26
  • @Hobo: 1. In most RDBMS skip scan not available. 2. In most situations that would be VERY slow, just a bit faster (and sometimes even slower) than simple table scan (in very rare situations it will really help). No magic in Oracle. Just a good rule to remember - DON'T rely on compound index if your criteria not using only top level columns from index (that's very common mistake to create large compound indexes). – Mash Apr 27 '09 at 20:43
  • @Mash Points taken. Definitely not saying skip scans are a silver bullet, just that there are circumstances where KeyY = ? _will_ use an index. Figured it's best to give as full a picture as possible. As for the speed, would hope the optimizer would pick the appropriate approach (though, as always, would measure rather than assume if in doubt) – Hobo Apr 27 '09 at 21:40
  • @Hobo. I think since it's a novice question - it's better not to give FULL picture, but smaller one at first. As of optimizer, you know - lots of studies shown that AI of Oracle optimizer smarter than it should and Oracle 10 slower most of the time than Oracle 9 in practice just because it's too smart in theory. – Mash Apr 28 '09 at 08:04
0

Which queries can be sped up by composite indices or not

In general, a composite index can only significantly speed up inequalities on the last column.

E.g. a x-y composite B-tree index would:

  • efficiently speed up:
    • x = 1 and y = 2: equalities in both columns
    • x = 1 and y > 2: equality on first column plus inequality on the second column
  • very limited speed up:
    • x > 1 and y > 2: inequalities on both columns, including the first
    • x > 1 and y = 2: inequality on the first column
    • y > 2: this is equivalent to x > -infinity and y > 2, so it is just the worse case possible for the compound B-tree index. Note however, this case could be solved efficiently with a B-tree index.

If you need inequalities on both columns, then you should look into spatial indices such as R-trees, I've given more details at: What is a SPATIAL INDEX and when should I use it?

E.g. consider the following index:

x|y

1|1
1|2
1|3
1|4
1|5
1|6

2|2
2|2
2|2
2|3
2|3
2|3
2|4
2|4
2|4

4|2
4|2
4|2
4|3
4|3
4|3
4|4
4|4
4|4

5|3
5|4
5|5
5|6
5|7
5|8

An index can only speed up a query very significantly if all the rows are side to side in the index.

So if for example we want:

x = 5 and y > 4

we get the contiguous:

5|5
5|6
5|7
5|8

But if we want:

x > 0 and y > 4

the result set would not be contiguous, and would mean a bunch of useless scanning:

1|5
1|6
5|5
5|6
5|7
5|8
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985