5

I have a table for log entries, and a description table for the about 100 possible log codes:

CREATE TABLE `log_entries` (
  `logentry_id` int(11) NOT NULL AUTO_INCREMENT,
  `date` datetime NOT NULL,
  `partner_id` smallint(4) NOT NULL,
  `log_code` smallint(4) NOT NULL,
  PRIMARY KEY (`logentry_id`),
  KEY `IX_code` (`log_code`),
  KEY `IX_partner_code` (`partner_id`,`log_code`)
) ENGINE=MyISAM ;

CREATE TABLE IF NOT EXISTS `log_codes` (
  `log_code` smallint(4) NOT NULL DEFAULT '0',
  `log_desc` varchar(255) DEFAULT NULL,
  `category_overview` tinyint(1) NOT NULL DEFAULT '0',
  `category_error` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`log_code`),
  KEY `IX_overview_code` (`category_overview`,`log_code`),
  KEY `IX_error_code` (`category_error`,`log_code`)
) ENGINE=MyISAM ;

The follwing query (matching 10k of 20k rows) executes in 0.0034 sec (using LIMIT 0,20):

SELECT log_entries.date, log_codes.log_desc FROM log_entries 
INNER JOIN log_codes ON log_codes.log_code = log_entries.log_code 
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1;

But when adding ORDER BY log_entries.logentry_id DESC, which is of course necessary, it slows down to 0.6 sec. Probably because "Using temporary" is used on the log_codes table? Removing the indexes actually makes the query perform faster, but still slow (0.3 sec).

EXPLAIN output of the query without ORDER BY:

+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+-------------+
| id | select_type | table       | type | possible_keys              | key              | key_len | ref                      | rows | Extra       |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+-------------+
|  1 | SIMPLE      | log_codes   | ref  | PRIMARY,IX_overview_code   | IX_overview_code | 1       | const                    |   56 |             |
|  1 | SIMPLE      | log_entries | ref  | IX_code,IX_partner_code    | IX_partner_code  | 7       | const,log_codes.log_code |   25 | Using where |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+-------------+

And including the ORDER BY:

+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+---------------------------------+
| id | select_type | table       | type | possible_keys              | key              | key_len | ref                      | rows | Extra                           |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+---------------------------------+
|  1 | SIMPLE      | log_codes   | ref  | PRIMARY,IX_overview_code   | IX_overview_code | 1       | const                    |   56 | Using temporary; Using filesort |
|  1 | SIMPLE      | log_entries | ref  | IX_code,IX_partner_code    | IX_partner_code  | 7       | const,log_codes.log_code |   25 | Using where                     |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+---------------------------------+

Any hints on how to get this query to perform faster? I can't see why "using temporary" should be needed, as the log codes should be chosen before fetching and sorting the appropiate log entries?

UPDATE @Eugen Rieck:

SELECT log_entries.date, lc.log_desc FROM log_entries INNER JOIN (SELECT log_desc, log_code FROM log_codes WHERE category_overview = 1) AS lc ON lc.log_code = log_entries.log_code WHERE log_entries.partner_id = 1 ORDER BY log_entries.logentry_id;
+----+-------------+-------------+------+-------------------------+------------------+---------+-------------------+------+---------------------------------+
| id | select_type | table       | type | possible_keys           | key              | key_len | ref               | rows | Extra                           |
+----+-------------+-------------+------+-------------------------+------------------+---------+-------------------+------+---------------------------------+
|  1 | PRIMARY     | <derived2>  | ALL  | NULL                    | NULL             | NULL    | NULL              |   57 | Using temporary; Using filesort |
|  1 | PRIMARY     | log_entries | ref  | IX_code,IX_partner_code | IX_partner_code  | 7       | const,lc.log_code |   25 | Using where                     |
|  2 | DERIVED     | log_codes   | ref  | IX_overview_code        | IX_overview_code | 1       |                   |   56 |                                 |
+----+-------------+-------------+------+-------------------------+------------------+---------+-------------------+------+---------------------------------+

UPDATE @RolandoMySQLDBA:

With my original indexes, ORDER BY date DESC:

SELECT log_entries.date, log_codes.log_desc FROM (SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries INNER JOIN (SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes USING (log_code) ORDER BY log_entries.date DESC;
+----+-------------+-------------+------+------------------+------------------+---------+------+-------+---------------------------------+
| id | select_type | table       | type | possible_keys    | key              | key_len | ref  | rows  | Extra                           |
+----+-------------+-------------+------+------------------+------------------+---------+------+-------+---------------------------------+
|  1 | PRIMARY     | <derived3>  | ALL  | NULL             | NULL             | NULL    | NULL |    57 | Using temporary; Using filesort |
|  1 | PRIMARY     | <derived2>  | ALL  | NULL             | NULL             | NULL    | NULL | 21937 | Using where; Using join buffer  |
|  3 | DERIVED     | log_codes   | ref  | IX_overview_code | IX_overview_code | 1       |      |    56 |                                 |
|  2 | DERIVED     | log_entries | ALL  | IX_partner_code  | NULL             | NULL    | NULL | 22787 | Using where                     |
+----+-------------+-------------+------+------------------+------------------+---------+------+-------+---------------------------------+

With your indexes, no ordering:

SELECT log_entries.date, log_codes.log_desc FROM (SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries INNER JOIN (SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes USING (log_code);
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+--------------------------------+
| id | select_type | table       | type  | possible_keys         | key                   | key_len | ref  | rows  | Extra                          |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+--------------------------------+
|  1 | PRIMARY     | <derived3>  | ALL   | NULL                  | NULL                  | NULL    | NULL |    57 |                                |
|  1 | PRIMARY     | <derived2>  | ALL   | NULL                  | NULL                  | NULL    | NULL | 21937 | Using where; Using join buffer |
|  3 | DERIVED     | log_codes   | index | IX_overview_code_desc | IX_overview_code_desc | 771     | NULL |    80 | Using where; Using index       |
|  2 | DERIVED     | log_entries | index | IX_partner_code_date  | IX_partner_code_date  | 15      | NULL | 22787 | Using where; Using index       |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+--------------------------------+

With your indexes, ORDER BY date DESC:

SELECT log_entries.date, log_codes.log_desc FROM (SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries INNER JOIN (SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes USING (log_code) ORDER BY log_entries.date DESC;
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+---------------------------------+
| id | select_type | table       | type  | possible_keys         | key                   | key_len | ref  | rows  | Extra                           |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+---------------------------------+
|  1 | PRIMARY     | <derived3>  | ALL   | NULL                  | NULL                  | NULL    | NULL |    57 | Using temporary; Using filesort |
|  1 | PRIMARY     | <derived2>  | ALL   | NULL                  | NULL                  | NULL    | NULL | 21937 | Using where; Using join buffer  |
|  3 | DERIVED     | log_codes   | index | IX_overview_code_desc | IX_overview_code_desc | 771     | NULL |    80 | Using where; Using index        |
|  2 | DERIVED     | log_entries | index | IX_partner_code_date  | IX_partner_code_date  | 15      | NULL | 22787 | Using where; Using index        |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+---------------------------------+

UPDATE @Joe Stefanelli:

SELECT log_entries.date, log_codes.log_desc FROM log_entries INNER JOIN log_codes ON log_codes.log_code = log_entries.log_code WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY date DESC;
+----+-------------+-------------+------+--------------------------+-----------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table       | type | possible_keys            | key             | key_len | ref                      | rows | Extra                                        |
+----+-------------+-------------+------+--------------------------+-----------------+---------+--------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | log_codes   | ALL  | PRIMARY,IX_code_overview | NULL            | NULL    | NULL                     |   80 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | log_entries | ref  | IX_code,IX_code_partner  | IX_code_partner | 7       | log_codes.log_code,const |   25 | Using where                                  |
+----+-------------+-------------+------+--------------------------+-----------------+---------+--------------------------+------+----------------------------------------------+
elaxsj
  • 474
  • 1
  • 5
  • 16
  • Please try `SELECT log_entries.date, lc.log_desc FROM log_entries INNER JOIN (SELECT log_desc, log_code FROM log_codes WHERE category_overview = 1) AS lc ON lc.log_code = log_entries.log_code WHERE log_entries.partner_id = 1 ORDER BY log_entries.logentry_id`and post back – Eugen Rieck Apr 18 '12 at 21:30
  • @Eugen Rieck Thanks. I've added EXPLAIN output of your query to the question. Query performance is about the same (0.6 sec on LIMIT 0,25). – elaxsj Apr 18 '12 at 22:24

3 Answers3

2

I think, most of problems here and in similar questions come from misunderstanding how MySQL (and other databases) uses indexes for sorting. The answer is: MySQL does not use indexes for sorting, it just can read data in the order of an index or in the opposite direction. If you happened to want the data to be oredered in the order of the currently used index - you are lucky, otherwise the result will be sorted (hence filesort in EXPLAIN)

That is order of the whole result mostly depends on which table was the first in the join. And if you look at your EXPLAIN you will see that the join starts from 'log_codes' table (because it is much smaller).

Basically, what you need is a composite index (partner_id, date) on 'log_entries', a covering composite index (log_code, category_overview, log_desc) for 'log_codes', change 'INNER JOIN' to 'STRAIGHT_JOIN' to force the join order, and order by 'date' DESC (this index will fortunately be covering too).

UPD1: I am sorry, I mistyped the index for the first table: it should be (partner_id, log_code, date).

But I still struggle to understand why MySQL choose to "use temporary" on the log_codes table (and 100x query time) when I try to sort on a column in another table?

MySQL can either directly output data as long as you agree with the ordering in which it gets it, or put data in a temporary table, apply sorting and output then. When you order by a field from any non-first table in joins, MySQL has to sort data (not just output in the order of an index) and to sort data it needs a temporary table.

But as I get further into the dataset it is slower (6 sec for LIMIT 50000,25). Do you know why?

To output rows 50000,25 MySQL anyway needs to fetch the first 50000 and skip them. Since I missed a column in the index, MySQL not just skanned the index but for each item made an additional on disc lookup for log_code value. With the covering index that should be much faster, since all data can be fetched from the index.

UPD2: try to force the index:

SELECT log_entries.date, log_codes.log_desc
FROM log_entries FORCE INDEX (IX_partner_code_date)
STRAIGHT_JOIN log_codes
  ON log_codes.log_code = log_entries.log_code
WHERE log_entries.partner_id = 1
  AND log_codes.category_overview = 1
ORDER BY log_entries.date DESC;
newtover
  • 31,286
  • 11
  • 84
  • 89
  • Valuable info. But I still struggle to understand why MySQL choose to "use temporary" on the log_codes table (and 100x query time) when I try to sort on a column in another table? Tested STRAIGHT_JOIN, and it seems to give very good performance (0.0058 sec for LIMIT 0,25)! But as I get further into the dataset it is slower (6 sec for LIMIT 50000,25). Do you know why? That is not a huge issue though, I can trade off performance for the oldest log entries for loading the most recent elements fast. The index (log_code, partner_id) performs pretty equal to (partner_id, date). – elaxsj Apr 19 '12 at 21:57
  • Thanks @newtower, this is very informative. I added a (partner_id, log_code, date) index. The index is used, but still slow. key_len is 2, so I guess the date part is not used? BTW, in the largest dataset that this query will be used, the cardinality of the index currently is (7,254,176904). I've heard that ideally the high cardinality fields should be at the beginning of the index, do you have any comments on that? – elaxsj Apr 20 '12 at 09:37
  • @elaxsj, `date` part is used for the selected values, there should be `using index` in EXPLAIN. BTW, I assume you order by `date` DESC. Concerning the cardinality, you are right, but this mostly affect the order in which the conditions are going to be applied. In this case, you first need to apply condition by `partner_id` and then for the rest rows check if the second condition is met in another table. – newtover Apr 20 '12 at 11:40
  • I'm ordering by `date` DESC, and have a `IX_partner_code_date(partner_id, log_code, date)` index. EXPLAIN output: `SIMPLE / log_entries / ref / IX_partner_code_logentry,IX_partner_code_date / IX_partner_code_date / 2 / const / 167457 / Using where; Using filesort`. Also tried to order by `logentry_id` DESC, using `IX_partner_code_logentry(partner_id, log_code, logentry_id)`, same result. Strange? Hopefully I'm able to sort this out, and the new query will work perfectly :) – elaxsj Apr 20 '12 at 11:59
  • Ah. Now I get it. I have some other fields in log_entries that I left out of the question to simplify, and I understand that the index needs to cover all these in order to sort by index. Probably better without a covering index then. However, I notice that the query performs best without any indexes on log_entries except PRIMARY (logentry_id). Does that imply that I should stick with only a PRIMARY index? Numbers follows... – elaxsj Apr 20 '12 at 15:01
  • Various indexes using `ORDER BY date DESC LIMIT 0,25`: (log_code):0.1884 sec; (partner_id):1.8092 sec; (partner_id, log_code):2.0036 sec; (log_code, partner_id):0.1859 sec; (PRIMARY):0.0044 sec /// Various indexes using `ORDER BY date DESC LIMIT 50000,25`: (log_code):1.3655 sec; (partner_id):2.9922 sec; (partner_id, log_code):3.1868 sec; (log_code, partner_id):1.3720 sec; (PRIMARY):2.7080 sec – elaxsj Apr 20 '12 at 15:01
  • @elaxsj, certainly. If it works better without indexes, you should betterremove them ) – newtover Apr 20 '12 at 15:09
1

You are going to need two things

REFACTOR THE QUERY

SELECT log_entries.date, log_codes.log_desc FROM 
(SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries
INNER JOIN
(SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes
USING (log_code); 

CREATE INDEXES TO SUPPORT SUBQUERIES AND REDUCE TABLE ACCESS

Before creating these indexes, run these

SELECT COUNT(1) rowcount,partner_id FROM log_entries GROUP BY partner_id;
SELECT COUNT(1) rowcount,category_overview FROM log_codes GROUP BY category_overview;

If none of the counts from all possible partner_id values exceed 5% of the log_entries table, create this index

ALTER TABLE log_entries ADD INDEX (partner_id,log_code,date);

If none of the counts from all possible category_overview values exceed 5% of the log_codes table, create this index

ALTER TABLE log_codes ADD INDEX (category_overview,log_code,log_desc);

Give it a Try !!!

Please try this refactored query with LIMIT 0,25 included

SELECT log_entries.date, log_codes.log_desc FROM 
(
    SELECT A.log_code FROM 
    (SELECT log_code FROM log_entries WHERE partner_id = 1) A INNER JOIN
    (SELECT log_code FROM log_codes WHERE category_overview = 1) B USING (log_code)
    LIMIT 0,25
) log_code_keys
INNER JOIN log_entries USING (log_code)
INNER JOIN log_code USING (log_code);
RolandoMySQLDBA
  • 43,883
  • 16
  • 91
  • 132
  • Thanks! I did not add the indexes, as about 40% of the log codes have category_overview, and at the moment 90% of the log entries have partner 1 (more partners will be added and it will even out). Gave the query a try, it's still slow, and I see "Using temporary" on the last derived query. I have added the EXPLAIN output of the query to my question. – elaxsj Apr 18 '12 at 22:26
  • Just to experiment, please make the indexes anyway and run it. – RolandoMySQLDBA Apr 19 '12 at 02:03
  • Indexed added, still "using temporary"/slow when ordering by date desc. I've updated my question with the EXPLAIN output. – elaxsj Apr 19 '12 at 17:34
  • The latest query returned 60950 rows, while there should only 9743 matching rows. My SQL knowledge is limited, so I can't see what is wrong with the query. Also remember that I need to order by date desc. Really appreciate your effort! – elaxsj Apr 19 '12 at 18:09
0

I'd start by reversing the columns in the IX_partner_code and IX_overview_code indexes. That should make them better suited to support both the JOIN and the WHERE clause.

...
KEY `IX_code_partner` (`log_code`,`partner_id`)
...
KEY `IX_code_overview` (`log_code`,`category_overview`),
...
Joe Stefanelli
  • 132,803
  • 19
  • 237
  • 235
  • Thanks. I tried to change the indexes. "Using temporary" still present, and seems like the code/overview index was not used. – elaxsj Apr 18 '12 at 22:30