94

I know the importance of indexes and how order of joins can change performance. I've done a bunch of reading related to multi-column indexes and haven't found the answer to my question.

I'm curious if I do a multi-column index, if the order that they are specified matters at all. My guess is that it would not, and that the engine would treat them as a group, where ordering doesn't matter. But I wish to verify.

For example, from mysql's website (http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html)

CREATE TABLE test (
    id         INT NOT NULL,
    last_name  CHAR(30) NOT NULL,
    first_name CHAR(30) NOT NULL,
    PRIMARY KEY (id),
    INDEX name (last_name,first_name)
);

Would there be any benifit in any cases where the following would be better, or is it equivalent?

CREATE TABLE test (
    id         INT NOT NULL,
    last_name  CHAR(30) NOT NULL,
    first_name CHAR(30) NOT NULL,
    PRIMARY KEY (id),
    INDEX name (first_name,last_name)
);

Specificially:

INDEX name (last_name,first_name)

vs

INDEX name (first_name,last_name)
James Oravec
  • 19,579
  • 27
  • 94
  • 160

3 Answers3

161

When discussing multi-column indexes, I use an analogy to a telephone book. A telephone book is basically an index on last name, then first name. So the sort order is determined by which "column" is first. Searches fall into a few categories:

  1. If you look up people whose last name is Smith, you can find them easily because the book is sorted by last name.

  2. If you look up people whose first name is John, the telephone book doesn't help because the Johns are scattered throughout the book. You have to scan the whole telephone book to find them all.

  3. If you look up people with a specific last name Smith and a specific first name John, the book helps because you find the Smiths sorted together, and within that group of Smiths, the Johns are also found in sorted order.

If you had a telephone book sorted by first name then by last name, the sorting of the book would assist you in the above cases #2 and #3, but not case #1.

That explains cases for looking up exact values, but what if you're looking up by ranges of values? Say you wanted to find all people whose first name is John and whose last name begins with 'S' (Smith, Saunders, Staunton, Sherman, etc.). The Johns are sorted under 'J' within each last name, but if you want all Johns for all last names starting with 'S', the Johns are not grouped together. They're scattered again, so you end up having to scan through all the names with last name starting with 'S'. Whereas if the telephone book were organized by first name then by last name, you'd find all the Johns together, then within the Johns, all the 'S' last names would be grouped together.

So the order of columns in a multi-column index definitely matters. One type of query may need a certain column order for the index. If you have several types of queries, you might need several indexes to help them, with columns in different orders.

You can read my presentation How to Design Indexes, Really for more information, or the video.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 38
    I like the telepone book analogy a lot – Pascal Klein Nov 30 '16 at 17:16
  • Would a multicolumn index help a multicolumn order-by sort? Or does it only help mutlicolumn where constraints? – CMCDragonkai Apr 06 '17 at 03:51
  • 4
    @CMCDragonkai, again, think of the telephone book analogy. It *is* sorted by a multi-column key: `lastname`, `firstname`. If you do a query asking for data with `ORDER BY lastname, firstname`, then the query optimizer says "Hey! It's already stored in that order! I can just read it in its natural order and send it to the user, I don't have to re-sort it!" – Bill Karwin Apr 06 '17 at 05:49
  • 1
    But it wouldn't work if the ordering was ASC and DESC or DESC and ASC right? It would only work for ASC and ASC or DESC and DESC. – CMCDragonkai Apr 07 '17 at 01:41
  • 2
    @CMCDragonkai, Yes, that's an issue. Well done for making that connection so quickly, by the way. Many developers would not have been able to anticipate that. MySQL 8.0 is developing a feature to deal with that. When you create the index, you can declare which columns are ascending and which are descending. Then later if you search with the same combination of ASC and DESC matching the "direction" of the columns in that index, the query can be optimized with that index. See http://mysqlserverteam.com/mysql-8-0-labs-descending-indexes-in-mysql/ – Bill Karwin Apr 07 '17 at 02:22
  • How does order of multiple-column index affect in Inner Join queries like `Inner Join x on x.FieldA = some_col and x.FieldB = other_col`? I have observed a query where the performance is dependant on order of indexes i.e. using Index(FieldA, FieldB) was slower than using Index(FieldB, FieldA). What could be the reason? – Rohanil Aug 02 '19 at 10:43
  • In general, MySQL only uses one index per table. If you told it to `USE INDEX(FieldA, FieldB)` that does not mean use both, it means pick one of those two and ignore all other indexes. – Bill Karwin Aug 02 '19 at 14:23
  • @Rohanil the reason is given in another answer below: https://stackoverflow.com/a/24315330/1768736 – FBB Apr 16 '21 at 17:06
  • @BillKarwin you explain each and every think very good. but i have some type of confusion about name searching what if i search with **(first name or last name)** combination and some time search with **(last and first name)** combination and also search with **(first middle or last)** combination name. so what can i make three different combination indexes or can only one combination work for three different search scenario? – yasir shah May 20 '22 at 12:09
  • @yasirshah, You may need multiple indexes to support different query forms. Also, a query for `(firstname=? OR lastname=?)` is a problem. See my answer on this question: https://stackoverflow.com/questions/13750475/sql-performance-union-vs-or/13866221 – Bill Karwin May 20 '22 at 17:27
31

The two indexes are different. This is true in MySQL and in other databases. MySQL does a pretty good job of explaining the different in the documentation.

Consider the two indexes:

create index idx_lf on name(last_name, first_name);
create index idx_fl on name(first_name, last_name);

Both of these should work equally well on:

where last_name = XXX and first_name = YYY

idx_lf will be optimal for the following conditions:

where last_name = XXX
where last_name like 'X%'
where last_name = XXX and first_name like 'Y%'
where last_name = XXX order by first_name

idx_fl will be optimal for the following:

where first_name = YYY
where first_name like 'Y%'
where first_name = YYY and last_name like 'X%'
where first_name = XXX order by last_name

For many of these cases, both indexes could possibly be used, but one is optimal. For instance, consider idx_lf with the query:

where first_name = XXX order by last_name

MySQL could read the entire table using idx_lf and then do the filtering after the order by. I don't think this is an optimization option in practice (for MySQL), but that can happen in other databases.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
13

The general rule is that in a multi-column index, you want to put the most selective -- that is, the one that will give you fewest results -- first. So if you are creating a multiple-column index on a table with a status column of say 10 possible values, and also a dateAdded column, and you're typically writing queries like

SELECT * FROM myTable WHERE status='active' and dateAdded='2010-10-01'

...then you'd want dateAdded first, because that would limit the scan to just a few rows rather than 10% (or whatever proportion are 'active') of your rows.

This takes a fair bit of thought and tuning; you should check out the Lahdenmaki and Leach book.

Graham Charles
  • 9,394
  • 3
  • 26
  • 41
  • I agree with the order when doing selects, but my question is related to the order of the columns in the multi-column index. – James Oravec Jun 19 '14 at 20:07
  • 1
    No, I was talking about index creation, too. But you create indexes to support the kinds of queries your database will be tasked with -- hence the example. – Graham Charles Jun 19 '14 at 20:11
  • 1
    In your specific example, it's likely that column order in the index wouldn't matter in most places, because First Name and Last Name distribution are roughly equal. But if you're in, say, Vietnam, where there are very few different surnames, then you'd want First Name first in the index. – Graham Charles Jun 19 '14 at 20:15
  • the order of column conditions does not matter in where clause, your case is totally wrong – Jack Zhu Dec 02 '20 at 03:51
  • 1
    I'll repeat, or re-repeat. I was *not* discussing column order in the WHERE clause. (The parser doesn't care.) Column order a multi-column index *does* matter. – Graham Charles Dec 04 '20 at 19:14