CASE query optimization

Question

SELECT
  COUNT(CASE WHEN VALUE = 1 THEN 1 END) AS score_1,
  COUNT(CASE WHEN VALUE = 2 THEN 1 END) AS score_2,
  COUNT(CASE WHEN VALUE = 3 THEN 1 END) AS score_3,
  COUNT(CASE WHEN VALUE = 4 THEN 1 END) AS score_4,
  COUNT(CASE WHEN VALUE = 5 THEN 1 END) AS score_5,
  COUNT(CASE WHEN VALUE = 6 THEN 1 END) AS score_6,
  COUNT(CASE WHEN VALUE = 7 THEN 1 END) AS score_7,
  COUNT(CASE WHEN VALUE = 8 THEN 1 END) AS score_8,
  COUNT(CASE WHEN VALUE = 9 THEN 1 END) AS score_9,
  COUNT(CASE WHEN VALUE = 10 THEN 1 END) AS score_10
FROM
  `answers`
WHERE
`created_at` BETWEEN '2017-01-01 00:00:00' AND '2019-11-30 23:59:59'

Is there a way to optimize this query, because I have 4 million answer records in my DB, and it runs very slowly?

What's very slowly? Do you have an index on created_at?And I suspect sum would be better than count. — P.Salmon, Feb 19 '20 at 13:16
Use `SELECT COUNT() GROUP BY VALUE`, then pivot it using CASEs. — Akina, Feb 19 '20 at 13:18
Unlikely that the `CASE` statement is the problem here. If indexes are lacking, your request involves a full table scan, that is 4 million records. Add `explain` in front of your statement to see the execution plan. Also, the resultset may be heavy in terms of data since your query spans almost 3 years. And the data has to travel over the network to reach your front-end. — Kate, Feb 19 '20 at 16:08

score 0 · Answer 1 · answered Feb 19 '20 at 13:17

0

Try running this one time to create an index:

CREATE INDEX ix_ca on answers(created_at)

That should speed your query up. If you are curious about why, see here:

What is an index in SQL?

answered Feb 19 '20 at 13:17

mikeb

10,578
7
62
120

That index _probably won't_ be used -- assuming that a lot of the table is in that date range, hence it would be faster to ignore the index and simply scan the table. (For a _short_ time span, the index would help a lot.) – Rick James Feb 19 '20 at 20:24

score 0 · Answer 2 · answered Feb 19 '20 at 13:33

0

You could try add a redundant composite index

create idx1 on table answers(created_at, value)

using redudance in index the query should be result without accessing to table data just using the index content

answered Feb 19 '20 at 13:33

ScaisEdge

131,976
10
91
107

This "covering index" _may_ be moderately faster than a table scan. – Rick James Feb 19 '20 at 20:14

Rick James · Answer 3 · 2020-02-19T20:28:35.807

Want it to be 10 times as fast? Use the Data Warehousing technique of buiding and maintaining a "Summary table". In this example the summary table might be

CREATE TABLE subtotals (
    dy DATE NOT NULL,
    `value` ... NOT NULL,   -- TINYINT UNSIGNED ?
    ct SMALLINT UNSIGNED NOT NULL, -- this is 2 bytes, max 65K; change if might be bigger
    PRIMARY KEY(value, dy)  -- or perhaps the opposite order
) ENGINE=InnoDB

Each night you summarize the day's data and build 10 new rows in subtotals.

Then the "report" query becomes

SELECT
  SUM(CASE WHEN VALUE = 1 THEN ct END) AS score_1,
  SUM(CASE WHEN VALUE = 2 THEN ct END) AS score_2,
  SUM(CASE WHEN VALUE = 3 THEN ct END) AS score_3,
  SUM(CASE WHEN VALUE = 4 THEN ct END) AS score_4,
  SUM(CASE WHEN VALUE = 5 THEN ct END) AS score_5,
  SUM(CASE WHEN VALUE = 6 THEN ct END) AS score_6,
  SUM(CASE WHEN VALUE = 7 THEN ct END) AS score_7,
  SUM(CASE WHEN VALUE = 8 THEN ct END) AS score_8,
  SUM(CASE WHEN VALUE = 9 THEN ct END) AS score_9,
  SUM(CASE WHEN VALUE = 10 THEN ct END) AS score_10
FROM
  `subtotals`
WHERE `created_at` >= '2017-01-01'
  AND `created_at`  < '2019-12-01'

Based on what you have provided, there will be about 10K rows in subtotals; that's a lot less to wade through than 4M rows. It might run more than 10 times as fast.

More discussion: http://mysql.rjweb.org/doc.php/summarytables

CASE query optimization

3 Answers3