4

Which Composite Index would make this simple MySQL query faster, and how would I create that Composite Index?

SELECT * 
FROM  `Table1` 
WHERE  `col1` =  '145307'
AND  `col2` =  '0'
AND col3 NOT 
IN ( 130209, 130839 ) 
ORDER BY col4 DESC 
LIMIT 0 , 5

There is already an individual index on each column above (col1 to col4).


EDIT:

Results of SHOW CREATE TABLE:

CREATE TABLE `Table1` (  
 `primaryCol` int(11) NOT NULL AUTO_INCREMENT,
 `col3` int(11) DEFAULT '0',
 `col5` varchar(20) COLLATE utf8_bin DEFAULT NULL,
 `col1` int(11) DEFAULT '0',
 `col6` varchar(80) COLLATE utf8_bin DEFAULT NULL,
 `col7` text CHARACTER SET utf8,
 `col4` int(11) DEFAULT '0',
 `col8` char(1) COLLATE utf8_bin DEFAULT 'N',
 `col9` char(1) COLLATE utf8_bin DEFAULT 'N',
 `col2` tinyint(1) NOT NULL,
 `col10` tinyint(1) NOT NULL,
 `col11` smallint(6) NOT NULL,
 PRIMARY KEY (`primaryCol`),
 KEY `col5` (`col5`),
 KEY `col1` (`col1`),
 KEY `col3` (`col3`),
 KEY `col4` (`col4`),
 KEY `col8` (`col8`),
 KEY `col9` (`col9`),
 KEY `CompIndex1` (`col1`,`col8`,`col4`),
 KEY `col2` (`col2`),
 KEY `col10` (`col10`),
 KEY `col11` (`col11`),
 FULLTEXT KEY `col7` (`col7`)
) ENGINE=MyISAM AUTO_INCREMENT=4575350 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

Results of EXPLAIN EXTENDED:

id  select_type table   type    possible_keys   key key_len ref rows    filtered    Extra
1   SIMPLE  Table1  ref col1,col3,CompIndex1,col2   CompIndex1  5   const   226 100 Using where; Using filesort
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
ProgrammerGirl
  • 3,157
  • 7
  • 45
  • 82
  • 2
    can you display the table structure also a `explain your_query`? – jcho360 Dec 12 '12 at 19:29
  • What's the storage engine? – Jason McCreary Dec 12 '12 at 19:30
  • how slow is it now and over what data size? – Woot4Moo Dec 12 '12 at 19:31
  • 1
    How about not doing `SELECT *`? – Kermit Dec 12 '12 at 19:32
  • @njk sometimes select * is appropriate, especially when you need all the columns. – Woot4Moo Dec 12 '12 at 19:34
  • @Woot4Moo I beg to differ. Let's say you write a query to select all the columns and some new guy comes in and adds 5285320162 more columns. Or [a better explanation](http://stackoverflow.com/questions/321299/what-is-the-reason-not-to-use-select). – Kermit Dec 12 '12 at 19:42
  • @njk sure you may beg to differ on this, but what if i need all of those columns? They must be there for a reason, or perhaps there is an awful dba that likes nulls? – Woot4Moo Dec 12 '12 at 19:43
  • @Woot4Moo I highly doubt you read the link I provided in 41 seconds. – Kermit Dec 12 '12 at 19:44
  • @njk the person with the accepted answer is definitely generalizing, and clearly doesn't use indexes. You can profile on a select *. – Woot4Moo Dec 12 '12 at 19:45
  • @njk granted his argument could be that there is no where clause, which is even less probable. – Woot4Moo Dec 12 '12 at 19:47
  • While I understand the point, `SELECT *` is *not always evil*. For example, under the InnoDB engine, the full record (all columns) are leafs of the primary key index. – Jason McCreary Dec 12 '12 at 19:48
  • @jcho360: I've edited my answer to include `SHOW CREATE TABLE` and `EXPLAIN EXTENDED` as requested. Please advise. – ProgrammerGirl Dec 12 '12 at 20:27
  • @JasonMcCreary, `SELECT *` usually spoils the chance of using a covering index. Also long BLOB/TEXT/VARCHAR values can overflow into auxiliary pages. – Bill Karwin Dec 12 '12 at 20:40

3 Answers3

7

I would suggest an index on (col1, col2, col3).

mysql> CREATE INDEX NewIndex ON Table1 (col1,col2,col3);

mysql> EXPLAIN SELECT *  FROM  `Table1`  WHERE  `col1` =  '145307' 
AND  `col2` =  '0' AND col3 NOT  IN ( 130209, 130839 )  
ORDER BY col4 DESC  LIMIT 0 , 5\G

           id: 1
  select_type: SIMPLE
        table: Table1
         type: ref
possible_keys: col1,col3,CompIndex1,col2,NewIndex
          key: NewIndex
      key_len: 6
          ref: const,const
         rows: 1
        Extra: Using where; Using filesort

Your condition on col3 is not an equality comparison, it's a range comparison, and that should be the last column in the index.

Unfortunately, this means that you can't get rid of the "using filesort" in the EXPLAIN plan. In general, you can't optimizing sorting with indexes if you also have a range comparison on a different column.

But you can at least use the three-column index to narrow down the search so the filesort will have to do work on a smaller set of rows, and then it'll probably be able to do it in memory.

See also my presentation How to Design Indexes, Really.

Jason McCreary
  • 71,546
  • 23
  • 135
  • 174
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • and what do you think about all those index? http://stackoverflow.com/questions/4120160/mysql-too-many-indexes and the `select *....` shouldn't be change? – jcho360 Dec 12 '12 at 20:44
  • @jcho360, I disagree with the answer to that post. And I didn't comment on the `SELECT *` because it wasn't part of the question. I agree that another way to benefit from indexes is to make an *index-only query*, but I'll assume the OP really needs all the column in this case. – Bill Karwin Dec 12 '12 at 20:52
  • + for a thorough answer. Is there an opportunity for optimization through a subquery or `EXISTS` instead of the `NOT IN`? – Jason McCreary Dec 12 '12 at 20:59
  • @JasonMcCreary, sorry, I don't think so. Those would still qualify as range comparisons. You can use an index to benefit the range comparison or the sorting, but not both. – Bill Karwin Dec 12 '12 at 21:12
  • @BillKarwin: Excellent and thorough answer, thank you! Bonus points for your presentation, it's clear, concise, and informative. Thanks again! – ProgrammerGirl Dec 12 '12 at 21:16
  • @BillKarwin I wish I could listen to the presentation behind the slides. – Kermit Dec 13 '12 at 00:43
  • @njk, I gave a webinar with a version of those slides in October: http://www.percona.com/webinars/tools-and-techniques-index-design – Bill Karwin Dec 13 '12 at 00:48
  • @BillKarwin Cheers. Looking forward to watching it. – Kermit Dec 13 '12 at 03:04
  • @Bill Karwin, I see. I would appreciate your input on [an unanswered SQL question](http://stackoverflow.com/questions/13750475/sql-performance-union-vs-or). – Jason McCreary Dec 13 '12 at 15:22
  • Depending on the number of unique values in col3, it might not give you any benefits of including it in the index. – Ztyx Jun 28 '13 at 13:02
  • @Ztyx, right, all B-tree searches give a greater relative benefit the more they reduce the results. If the cardinality of the index isn't good, the optimizer may decide that there's no benefit in using it, even to the point of doing a table-scan anyway. – Bill Karwin Jun 28 '13 at 15:24
1

If you're going to create a composite index, then it would be redundant to have indexes on individual columns which are members of the composite index. Figure out which queries you're going to most often run, the cardinality of each column involved in the queries, the approximate number of rows the queries will return, total rows in the table, is the table oftenly updated, etc. Remember that indexes cost against insertions and updates.

Joe R.
  • 2,032
  • 4
  • 36
  • 72
1

The current best answer is the most horrible one. Everytime you see a "filesort" in EXPLAIN, it's going to be horribly slow, because MySQL has to create temporary files and doesn't use the index for sorting!

Indexes are like phone books. In your case, the correct index would be:

(col4, col1, col2, col3, primaryCol)

The order matters! The leftmost part of your composite index must always be the columns that you want to order by. Then, add other columns that you use in your WHERE clause. Finally, for MyISAM, you need to add the primary key, too (InnoDB does that automatically).

Also, you need to change your query so the ordering/filtering, and getting full details, are seperated:

SELECT * FROM `Table1` JOIN (
  SELECT primaryKey FROM  `Table1` 
  WHERE  `col1` =  '145307'
  AND  `col2` =  '0'
  AND col3 NOT 
  IN ( 130209, 130839 ) 
  ORDER BY col4 DESC 
  LIMIT 0 , 5
) foo ON Table1.primaryKey=foo.primaryKey

Checking this with EXPLAIN, you will see proper index usage, for the ordering and filtering. The result ids are then joined again with the table (in another fast index operation) to get the full details.