2

I'm looking into doing some query optimization and have a question regarding ordering of multiple indexes.

I know you can do indexes on multiple columns and that you have to match the order of the where clause with the index field order.

My question is about the ordering of the fields in the multi field index. For example if we had FirstName and LastName fields and we assume that LastName is more unique than FirstName, is there a benefit of doing the index in one ordering over the other?

E.g. Would an index of (LastName, FirstName) be expected to perform better than an index of (FirstName, LastName) or vise versa?

I heard from someone that you want to do the most narrowing item first in your where clauses, so if this is the case, I would assume we'd want the same in the indices, but I wish to confirm.

James Oravec
  • 19,579
  • 27
  • 94
  • 160
  • 1
    possible duplicate of [Which column to put first in index? Higher or lower cardinality?](http://stackoverflow.com/questions/12315496/which-column-to-put-first-in-index-higher-or-lower-cardinality) – Willem Renzema Apr 11 '15 at 11:37

2 Answers2

1

The order of the tests in the WHERE clause is irrelevant. And if you're matching both columns in the index, their order in the index is also irrelevant. The query optimizer will effectively concatenate the two fields in their index order, and then find that entry in the index in a single step.

The order of columns in a composite index only matters when you're matching a subset of the fields. The index can still be used if you're searching for some prefix of the index. E.g. (firstname, lastname) can be used if you do WHERE firstname = 'John', but not if you do WHERE lastname = 'Smith'. See Does the order of columns matter in a multi-column index if there is no value in the WHERE clause for more details about this.

Community
  • 1
  • 1
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 1
    I strongly disagree that ordering in the index is irrelevant when selecting a subset of the data filtered on the leading attributes in an index. Putting the attribute with the greatest cardinality first reduces the number of read operations required to resolve the query. – symcbean Apr 10 '15 at 23:08
  • I said it's irrelevant when you're matching ALL the columns in the index, not when matching a subset. – Barmar Apr 10 '15 at 23:12
  • In such a case, I still think you're wrong, I was just trying to describe the more general case. – symcbean Apr 10 '15 at 23:16
  • 1
    You may be right, I have to think about it. Perhaps you should write your own answer that explains it. – Barmar Apr 10 '15 at 23:25
  • 1
    On reflection it won't make a difference for hash indexes but it will have a significant impact on B-trees (I forgot that some folk still use hash indexes) – symcbean Apr 10 '15 at 23:26
  • A proper answer would be a little verbose here. Whether the index has the high cardinality value first or the low cardinality first doesn't change the number of *logical* operations, the difference arises due to grouping of physical entries into pages (and pages in to physical disk sectors or even memory pages). There are also fewer bytes to compare to find a difference but that is very insignificant relative to the paging problem. – symcbean Apr 10 '15 at 23:40
0

(Too many responses to simply write comments.)

WHERE lastname='Fangs' AND firstname='Venom' -- The WHERE and the INDEX can be in either order. I disagree with the comment about cardinality. In this case, it only matters how many rows there are for the combination. The BTree drilldown is the same effort either way. (The number of bytes compared is minor compared to the number of index rows that need to be checked.)

WHERE firstname LIKE 'V%' AND lastname = 'Fangs' -- INDEX(lastname, firstname) is the best order for the INDEX. (Order in WHERE does not matter). Note that the '=' is first, then the 'range' (LIKE).

For InnoDB (the default engine), there is no "hash" index option.

If you have INDEX(a,b,c), a WHERE that mentions (a), or (a and b), or all three fields may use that INDEX. It can use that index if the tests are on "=" and ANDed together.

With a few exceptions, the first field(s) in the index need to be tested with '=' in the WHERE, then one more field can be a "range" test (as with the LIKE example, above).

For example, WHERE firstname LIKE 'V%' AND lastname LIKE 'F%' will use only one field (the first field) of any index. If you have both INDEX(lastname, firstname) and INDEX(firstname, lastname), cardinality matters.

All this, and more, is in my Cookbook on building the best INDEX from a SELECT.

Rick James
  • 135,179
  • 13
  • 127
  • 222