0

I have a table with 6 million rows, I need to look up six columns to select the items. So I created an index with six columns.

Does changing the order of the columns in the index affect the query speed?

Does the speed of query depend on the number of unique items each column has? or the total number of possible results the condition on the column has?

I tried reordering the columns in the index and there seems to be some significant change in the query speed

table: pic_tag_relations

Columns:

  • tag_id (int) (1-1,000,000) (cardinality: lowest)
  • contrast_score(float) (0-24)
  • quality_score (float) (0-10)
  • pic_ratio_type(tinyint) (0/1/2)
  • is_okay (tinyint) (0/1)
  • already_used_count (integer) (1-10)

Query:

select * from pic_tag_relations where tag_id in ($all_tag_ids) && contrast_score>2.5 && quality_score>3 && is_okay=1 && pic_ratio_type='2' && already_used_count<10

Mr poppins
  • 53
  • 1
  • 12
  • 1
    It's probably best to have the index with the highest cardinality first. – Barmar Apr 05 '21 at 14:56
  • 2
    Yes, it does have a strong effect. Please add the queries you'll be running to the question. – The Impaler Apr 05 '21 at 14:56
  • 2
    I would argue that the more specific fields should appear first in the index (but some might disagree). – Tim Biegeleisen Apr 05 '21 at 14:56
  • Higher "selectivity" columns first. – The Impaler Apr 05 '21 at 14:57
  • @Barmar Would you explain the why please – Mr poppins Apr 05 '21 at 14:59
  • 1
    My first instinct is to say: An index is an index, no matter how it was build. But [the manual begs differ](https://dev.mysql.com/doc/refman/8.0/en/multiple-column-indexes.html): _"If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table."_ – KIKO Software Apr 05 '21 at 15:00
  • @TimBiegeleisen I'm of the same opinion as you, filter out the majority of results first... I'd love to read someone explain it way better than I have conceptualized – Mr poppins Apr 05 '21 at 15:01
  • @KIKOSoftware "several kinds of queries" refers to queries that specify different sets of columns to match. – Barmar Apr 05 '21 at 15:01
  • @Barmar: yes, true, but it states that the order of the columns does matter. – KIKO Software Apr 05 '21 at 15:02
  • 1
    @KIKOSoftware Because a query that skips some columns in the order will not be able to take advantage of the full index. But the question is about queries that use all the columns. – Barmar Apr 05 '21 at 15:03
  • @Barmar: I am aware of that. – KIKO Software Apr 05 '21 at 15:04
  • @KIKOSoftware order of the index should match the order of the columns mentioned in the query? – Mr poppins Apr 05 '21 at 15:05
  • That's not how MySQL works. It first dissects the queries into its relevant parts and then decides on the best order of the columns. This could very well depend on the order of the columns in your index. Of course the freedom to choose depends on the exact query you have. – KIKO Software Apr 05 '21 at 15:06
  • @KIKOSoftware yes, that's my understanding too, thanks for clarifying. As Barmar said, manual is possibly referring to using the first (1/2/3) columns used in the index rather than skipping the order and querying columns in between in the index order – Mr poppins Apr 05 '21 at 15:10
  • @NicoHaase yes, I have the same question, more columns in the index, I'm looking into the answers, thank you – Mr poppins Apr 05 '21 at 15:24
  • @NicoHaase This question is significantly different from the 2 column index that the official MySQL manual addresses, it uses multiple operators too, worth staying as a separate question – Mr poppins Apr 05 '21 at 15:34

2 Answers2

1

The ordering of the columns in the index affects what queries can be used. If the conditions are all equality, then the ordering makes little difference.

There can be a small difference between having keys with lots of values first versus keys with a small number of values. However, I would not worry about the different in performance when the index is being used optimally.

Much more important is to have the columns in the index with the equality conditions being the first columns (whatever order), so the index is used effectively.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

I answered nearly the same question here: Does Order of Fields of Multi-Column Index in MySQL Matter

Yes, the order of columns does matter. To choose the right order of columns, you should have a specific query in mind. You haven't shown a query yet in this question.

If your query conditions are all = comparisons then the order of columns does not matter much. Some people try to micro-optimize by putting more selective columns to the left, but I find this does not make a significant difference.

For example, if you are looking up someone in the phone book by their full name, do you really care if the book is sorted by last name first or first name first? You're going to filter on both fields anyway.

But if you have a mix of = comparisons and other types of comparison operators, then the column order really does matter.

Basically, you can have as many = terms as you want, and the columns for these comparisons must be on the left end of the list of columns in the index. Then you can have ONE column in the index for an inequality comparison. Any further columns you search will not use the index.

You mentioned in a comment that you have a query with 3 comparisons with <. Like this:

... WHERE a < 1 AND b < 2 AND c < 3

This can use an index on at most one of these columns. In this case, you should choose the condition that will be most selective, i.e. that will match the smallest subset of rows.

If the terms used =:

... WHERE a = 1 AND b = 2 AND c = 3

Then all columns of a multi-column index on (a, b, c) could help the search, and the order of columns would not matter.

If you had a mix of comparisons:

... WHERE a = 1 AND b < 2 AND c < 3

Then you may have an index with a as the leftmost column, and one of b or c following it. But if you had both b and c, still only one would be used for the search — the first one that follows a.

See my presentation How to Design Indexes, Really or the video.


Re your updated question with an example query:

select * from pic_tag_relations where tag_id in ($all_tag_ids) && contrast_score>2.5 && quality_score>3 && is_okay=1 && pic_ratio_type='2' && already_used_count<10

You could have an index with (is_okay, pic_ratio_type) as the first columns, then you get to choose which one of the other columns should be the third column. The other columns are referenced in terms that use some comparison operator other than =. Which column is best would be based on how well each condition is likely to select a smaller subset of rows.

The other terms will also filter out rows, but not by using the index. They'll have to compare to the rows one by one as they are returned by the indexed search.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thank you, I have a mix of equal and comparison operations. I haven't watched the video yet. by "then the column order really does matter." I assume you mean "=" operation columns must be put on the left (first in the index) – Mr poppins Apr 05 '21 at 15:17
  • 1
    I updated my answer with some more examples. – Bill Karwin Apr 05 '21 at 15:20
  • Thanks. That certainly helps, I'll reduce the number of columns in the index – Mr poppins Apr 05 '21 at 15:37