1

I've got this very simple MySQL query:

SELECT target FROM table WHERE goal_id=1 AND year>=2015 AND year<=2020

The table has about 5 millions rows, though. And as a result, it's very slow (about 10 seconds).

What could I do to improve things? Would an index help, and if so on which column?

GMB
  • 216,147
  • 25
  • 84
  • 135
Hubert
  • 369
  • 3
  • 21

4 Answers4

3

For this query, you want the following index:

create index myindex on mytable(year, goal_id, target)

This gives you a covering index: all columns that come into play in the query are part of the index, so this gives the database a decent chance to execute the query by looking at the index only (without actually looking at the data).

The ordering of columns in the index is important: the first two columns correspond to the where predicates, and the last column is the column comes into play in the select clause.

Depending on the cardinality of your data, you might also want to try to invert the first two columns:

create index myindex on mytable(goal_id, year, target)

The base idea is that you want to put the more restrictive criteria first.

GMB
  • 216,147
  • 25
  • 84
  • 135
  • 1
    Use `goalid` first. – The Impaler Apr 16 '20 at 17:06
  • @TheImpaler: I was editing while you were commenting... Actually, it depends on how data is spread in the columns (eg if there are only two distinct `goalid`s and `year` spread over 100 disctinct values, you want `year` first). OP did not share that information. – GMB Apr 16 '20 at 17:10
  • Does include `target` field into index give some performance improvement? – Slava Rozhnev Apr 16 '20 at 17:11
  • @SlavaRozhnev: it does... that's the concept of *covering* index, which is explained in my answer). – GMB Apr 16 '20 at 17:12
  • 2
    Oh yes, it does. That's called a "covering index". It prevents MySQL's "dual index lookup" issue. Not needed, but if you need performance, do it. – The Impaler Apr 16 '20 at 17:13
  • 1
    That's awesome, thanks for the tips! Creating the index reduced the query from 10s to 0.2s – Hubert Apr 16 '20 at 23:18
  • No. Put the `=` column first. That way it will touch only consecutive rows in the index. – Rick James Apr 17 '20 at 04:20
  • More on the cardinality wives' tale: https://stackoverflow.com/questions/50239658/higher-cardinality-column-first-in-an-index-when-involving-a-range – Rick James Apr 17 '20 at 05:07
2

An index on year, goal_id, and target:

ALTER TABLE table ADD INDEX index_name (goal_id, year, target)

As Rick James has pointed out, goal_id needs to be first in the composite index, because it's matched with an =, followed by your range column year, then followed by what you want to retrieve, i.e. target.

Hendrik
  • 5,085
  • 24
  • 56
1

The rule is simple. Thing(s) tested with = first. Then do at most one "range". (And BETWEEN is equivalent in performance to an equivalent pair of inequalities.)

More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

And, as GMB points out, extending the index to make it covering gives another boost. All the work is done in the index's BTree; no need to look in the data's BTree. So, in this order:

INDEX(goal_id, year, target)
Rick James
  • 135,179
  • 13
  • 127
  • 222
0

SELECT target FROM table WHERE goal_id=1 AND year BETWEEN 2015 AND 2020

you may find faster query completion by referring to year only one time.

Wilson Hauck
  • 2,094
  • 1
  • 11
  • 19
  • Thanks, I didn't know that query – Hubert Apr 16 '20 at 23:18
  • No. `BETWEEN` actually is turned into the other format when the statement is parsed or optimizer. See `EXPLAIN EXTENDED SELECT ...; SHOW WARNINGS;` – Rick James Apr 17 '20 at 04:22
  • @hubert Please post your SHOW CREATE TABLE table; for my own deeper analysis (we will all see your current table and index structure). Rick James, Thank you for the details and your deeper knowledge of how things work in MySQL. – Wilson Hauck Apr 17 '20 at 13:51