25

I am looking into adding a composite index to a table in a MySQL database which will likely be several million rows in size. The composite will be comprised of two varchar columns as well as three int columns.

My question is as stated in the title: Is there an optimal order in which to create this composite index?

For instance, one of the int rows will likely only have 6 possible values, would it better for that column to be closer to the front of the index definition? Likewise, one of the varchar columns will likely have millions of different values, should that be near the front or back of the index definition?

Yves M.
  • 29,855
  • 23
  • 108
  • 144
chicagoCrazy
  • 374
  • 1
  • 4
  • 12

1 Answers1

48

As a rule of thumb, in a multi-column index, you want the columns that have the highest cardinality, or in other words, the highest number of distinct values, to come first in the index.

To be more accurate, you want the column with the fewest possible matches to your search criteria first so you can narrow the result set down as much as possible, but in general, it's the same as the highest cardinality.

So, in your example, you'll want the column that will have millions of distinct values to be in the index before the one with only 6 distinct values.

Assuming you're selecting only one row out of the millions of values, it allows you to eliminate more rows faster.

When considering two columns of similar cardinality, put the smaller one first (INTEGER columns before VARCHAR columns) because MySQL can compare and iterate over them faster.

One caveat is that if you are selecting with ranges (eg. WHERE datecol > NOW()), then you want the range columns farthest to the right, and your columns with a single constant (eg. WHERE id = 1) to the left. This is because your index can only be used for searching and ordering up to the point of the first range value.

Yves M.
  • 29,855
  • 23
  • 108
  • 144
Marcus Adams
  • 53,009
  • 9
  • 91
  • 143
  • 8
    Cardinality is not always the right criterion: I have a table with a multi-column index where the first column in the index has only two possible values, while subsequent columns have higher cardinality. This works because the index is designed for a specific search that always uses only 1 of the 2 values, and that value reduces the result set by 95%. The right question to ask is which column will reduce the result set the most, and cardinality is just a useful rule of thumb. – Kai Pommerenke Dec 21 '12 at 19:36
  • 4
    @Kai, your comment confuses me. Maybe you could give a specific example. – Marcus Adams Jan 09 '13 at 15:11
  • 1
    @MarcusAdams I understand what Kai is saying, I think it is explained pretty well here: http://www.percona.com/blog/2009/06/05/a-rule-of-thumb-for-choosing-column-order-in-indexes/ – Shane N Mar 11 '15 at 18:20
  • Thanks, I read the comment just now, and I understand. I improved my answer based on the comment. – Marcus Adams Mar 11 '15 at 20:50
  • what would be considered 'similar' cardinality? – bermick Sep 10 '21 at 12:57
  • these results seem to indicate the opposite https://news.ycombinator.com/item?id=34404641 – Mattwmaster58 Jun 22 '23 at 12:50