2

The problem I have is the following:

  • I have a table that contains about 100000000 rows
  • it has 22 fields - some numeric, some text
  • it has a primary key id (auto-incremented integer)
  • it has a field another_id of type bigint, and a unique key on it
  • it has a field called state that can take only 4 integer values (0 to 3)
  • I need that the queries of the following form are executed as fast as possible:
SELECT COUNT(*) 
FROM my_table 
WHERE another_id IN ( <about 100 values> ) 
AND state = ...

for different values of state.

How should the index look like? I was thinking about two options:

  • KEY another_id:state (another_id, state)
  • KEY state:another_id (state, another_id)

Is there any difference in performance between those two variants? Is there anything else to consider?

Edit: engine is InnoDB

Andrey Sh
  • 106
  • 5
  • If a column can only take 4 values, it's not going to be a very good discriminant (its cardinality is not very high). OTOH, a unique column is very discriminating. So you should put the unique index first in the composite index. – Barmar Jul 13 '22 at 15:58

2 Answers2

1

For the query you show, you should create the index with state, another_id in that order.

Define the index with any columns referenced in equality conditions first, after them add one column referenced in a range condition or ORDER BY or GROUP BY.

You may also like my answer to Does Order of Fields of Multi-Column Index in MySQL Matter or my presentation How to Design Indexes, Really, or the video.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • About "column referenced in a range condition": In my question, there is an "IN" condition (a value from a given set), not a range condition - would it influence your answer? – Andrey Sh Jul 13 '22 at 16:15
  • I count any expression other than `=` or `IS NULL` as a range condition. They all follow the principle I described above, whether they are `IN()`, `BETWEEN`, `>`, `<`, `!=`, or even `IS NOT NULL`. If the condition can match multiple values, i.e. it's not an equality condition, therefore it's in the broad category of range conditions. – Bill Karwin Jul 13 '22 at 16:47
  • So far, there were two responses to my question, and they contradict each other :) – Andrey Sh Jul 15 '22 at 16:05
  • @barmar says the order should be "another_id, state" because the IDs (which are unique) are more discriminating than the states; Bill Karwin (this answer) says the order should be "state, another_id" because the condition on another_id is not an equality but can match multiple values. I agree that both points might be valid - is there any way to decide which of those considerations outweigh the other? As the additional context - the DB engine is InnoDB, so I believe HASH/BTREE choice is done for me by the DB. – Andrey Sh Jul 15 '22 at 16:11
  • 1
    Yes, you can determine by testing both suggestions. Use EXPLAIN to see how they are using the columns of the index. Using performance profiling to see how many rows they have to read, and ultimately how fast they are. See my presentation [SQL query patterns, optimized](https://www.slideshare.net/billkarwin/sql-query-patterns-optimized) for techniques of measuring. – Bill Karwin Jul 15 '22 at 16:35
  • @AndreySh - Cardinality does _not_ factor into this or similar index questions. Bill's answer is correct. Think of it from the metric of how many 'rows' in the index need to be looked at. If necessary, then look at how spread out those rows are, especially in light of the blocking in InnoDB. – Rick James Aug 15 '22 at 14:26
0

I agree with the answer above. One clarification though is that you want to have ita hash index not btree index. It should work faster. The hash index wouldn't work well with any queries that involve inequality such as <=

  • The answers are sorted by votes, so "above answer" means nothing. Also, this should have been a comment on that answer. You may not be able to leave a comment yet, you just have to be patient. – Rohit Gupta Jul 15 '22 at 03:43
  • MySQL's default storage engine, InnoDB, does not support hash indexes. – Bill Karwin Jul 15 '22 at 16:07
  • A hash index [though not available] would fail poorly if it had a lot of collisions. – Rick James Aug 15 '22 at 14:27