0

This is for InnoDB with MySQL 5.7.

I have a set of 4 related cascading queries:

SELECT DISTINCT A, COUNT(*) FROM MYTABLE
    WHERE D IN ? AND A > ?
    GROUP BY A ORDER BY A LIMIT 100

SELECT DISTINCT B, COUNT(*) FROM MYTABLE
    WHERE A = ? AND D IN ? AND B > ?
    GROUP BY B ORDER BY B LIMIT 100

SELECT DISTINCT C, COUNT(*) FROM MYTABLE
    WHERE A = ? AND B = ? AND D IN ? AND C > ?
    GROUP BY C ORDER BY C LIMIT 100

SELECT E, F, G, H FROM MYTABLE
    WHERE A = ? AND B = ? AND C = ? AND D IN ? AND ID > ?
    ORDER BY ID LIMIT 100

What is the minimum set of index(es) such that all the queries can use one of the index(es) to prune by each of their WHERE clauses and also use it/them to speed up the ORDER BY?

From what I understand about composite indexes, I'll need:

CREATE INDEX INDEX01 ON MYTABLE (D, A)

CREATE INDEX INDEX02 ON MYTABLE (A, D, B)

CREATE INDEX INDEX03 ON MYTABLE (A, B, D, C)

CREATE INDEX INDEX04 ON MYTABLE (A, B, C, D)

(ID is the primary key column)

Is this correct?

I figure if I reorder the WHERE clauses, I can possibly make do with just one composite index:

SELECT DISTINCT A, COUNT(*) FROM MYTABLE
    WHERE D IN ? AND A > ?
    GROUP BY A ORDER BY A LIMIT 100

SELECT DISTINCT B, COUNT(*) FROM MYTABLE
    WHERE D IN ? AND A = ? AND B > ?
    GROUP BY B ORDER BY B LIMIT 100

SELECT DISTINCT C, COUNT(*) FROM MYTABLE
    WHERE D IN ? AND A = ? AND B = ? AND C > ?
    GROUP BY C ORDER BY C LIMIT 100

SELECT E, F, G, H FROM MYTABLE
    WHERE D IN ? AND A = ? AND B = ? AND C = ? AND ID > ?
    ORDER BY ID LIMIT 100

then I'll just need:

CREATE INDEX INDEX01 ON MYTABLE (D, A, B, C)

Is that correct?

However, I think that ordering the WHERE clauses this way would not be optimal. The reason for always trying to put the "IN" operation and the ">" operation as the last 2 WHERE clauses is:

  1. MySQL would need to do more work for "IN" (compare with multiple values) as compared to "=", and possibly (due to my dataset and what I'm filtering), less rows will be pruned by this clause.

  2. The ">" operation is mainly for pagination purposes. i.e. in certain cases, there will be little to no pruning due to this clause.

Is my understanding correct?

Rick James
  • 135,179
  • 13
  • 127
  • 222
Edwin Lee
  • 3,540
  • 6
  • 29
  • 36
  • It depends on your datatypes for a real-world healthy system. Are they thin or loaded up varchar(255) – Drew Jul 27 '16 at 06:47
  • Please see [this](http://stackoverflow.com/a/38002986) concerning the Optimizer and `IN` – Drew Jul 27 '16 at 06:49

1 Answers1

1

Do not do both DISTINCT and GROUP BY in the same query. Because of the aggregate (COUNT), you probably want GROUP BY, so toss DISTINCT.

For GROUP BY x ORDER BY x LIMIT 100, the following may help:

INDEX(x)  -- or INDEX(x, ...)

So, include that, just in case. By that I mean that the Optimizer may choose to use an index do handle the GROUP BY + ORDER BY + LIMIT instead of looking at the WHERE. If it decides to use the WHERE, then...

WHERE D IN ? AND A > ?
INDEX(D, A)

Can leapfrog ("MRR") the D's and scan the A's, but it cannot consume any GROUP BY or ORDER BY.

WHERE A = ? AND D IN ? AND B > ?
INDEX(A, D, B)

Put any '=' things first in the index. The logic for the rest is as above.

WHERE A = ? AND B = ? AND D IN ? AND C > ?
INDEX(A, B, D, C) or INDEX(B, A, D, C)

(Same logic)

WHERE A = ? AND B = ? AND C = ? AND D IN ? AND ID > ?
INDEX(A,B,C, -- in any order, then
      D, ID) -- at end, in this order.

So, for that set of 4 statements, I recommend 4 or 5 indexes, in the given orders:

INDEX(D, A)
INDEX(A, D, B)
INDEX(B, A, D, C)  -- I picked that one to get one starting with B
INDEX(c, B, A, D, ID)
INDEX(ID)  -- but don't add if you already have `PRIMARY KEY(ID)`

As a bonus, among those indexes, the first 3 SELECTs have "covering" indexes, which gives you an extra bonus. The last SELECT would need a 9-column index to "cover"; that is too much.

The order of AND'd things in a WHERE makes no difference. So, I think I can ignore the rest of your question.

(Caveat: Before about 5.6, the leapfrogging did not exist, so the "best" set of indexes would be something else.)

Rick James
  • 135,179
  • 13
  • 127
  • 222