0

I have this table which I'm trying to select from and to date. The query took 2 min to run on 4 million records. I'm not sure how much more I can squeeze out of this query.

    SELECT c.fk_id, c.from_date, c.fk_pb, MIN(o.from_date) AS to_date  
    FROM TABLE_X c  
        INNER JOIN TABLE_X o ON c.fk_id =  o.fk_id AND c.fk_pb = o.fk_pb  
WHERE o.from_date > c.from_date  
        GROUP BY c.fk_id, c.from_date, c.fk_pb

There are indexes on from_date, fk_pb and fk_id already.

The schema is like this.

+-----------------------------+---------------+------+-----+---------+-------+
| Field                       | Type          | Null | Key | Default | Extra |
+-----------------------------+---------------+------+-----+---------+-------+
| FK_ID                       | int(11)       | YES  | MUL | NULL    |       |
| FK_PB                       | int(11)       | YES  | MUL | NULL    |       |
| FROM_DATE                   | date          | YES  | MUL | NULL    |       |
| TO_DATE                     | date          | YES  |     | NULL    |       |
+-----------------------------+---------------+------+-----+---------+-------+

I know I should not use self-join at all in MySQL, but the data comes like this and I'm trying to find the best way to select from and to date out of this table. If there's anything else I could do to make this one faster that'd be great.

Thanks a lot.

UPDATED

+----+-------------+-------+------+----------------------------------------------------------------------+-------------------------+---------+----------------------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys                                                        | key                     | key_len | ref                                    | rows    | Extra                                        |
+----+-------------+-------+------+----------------------------------------------------------------------+-------------------------+---------+----------------------------------------+---------+----------------------------------------------+
|  1 | SIMPLE      | c     | ALL  | IDX_FK_PB,IDX_FK_ID,IDX_FRM_DATE                                     | NULL                    | NULL    | NULL                                   | 4527750 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | o     | ref  | IDX_FK_PB,IDX_FK_ID,IDX_FRM_DATE                                     | IDX_FK_ID               | 5       | db.c.FK_ID                             |     110 | Using where                                  |
+----+-------------+-------+------+----------------------------------------------------------------------+-------------------------+---------+----------------------------------------+---------+----------------------------------------------+
toy
  • 11,711
  • 24
  • 93
  • 176
  • The `WHERE` should also be in the `ON`, but I assume MySQL already optimizes that. What does `EXPLAIN` say? – Wrikken May 14 '13 at 22:06
  • Just updated the EXPLAIN command. – toy May 14 '13 at 22:16
  • With some DB engines, it is more efficient to perform a `SELECT TOP 1 * from_date ORDER BY from_date` than to perform a min(from_date). Not sure if MySQL is one of these, but you might try it. (Use the `LIMIT = 1 syntax in MySQL instead of the `TOP 1` of course. – Pieter Geerkens May 14 '13 at 22:20
  • 1
    @toy: that `IDX_FK_ID` key... I _hope_ it's an index on _all_ columns? (`INDEX(FK_ID, FK_PB,FROM_DATE)`) – Wrikken May 14 '13 at 22:26
  • Each column has its own index. Is there a difference between INDEX(FK_ID,FK_PB,FROM_DATE) and INDEX(FK_ID), INDEX(FK_PB), INDEX(FROM_DATE) – toy May 14 '13 at 22:27
  • @Pieter not sure how LIMIT 1 would work here, because I need to show MIN from each row not just one result row. – toy May 14 '13 at 22:29
  • @Wrikken I just changed to composite indexes and it really speed up the query. I'm surprised, but thanks a lot. – toy May 14 '13 at 22:34
  • Can you put that as an answer so I can accept that. – toy May 14 '13 at 22:35
  • @toy: added, with a bit of background information. – Wrikken May 14 '13 at 22:47

1 Answers1

2

Adding an index to all relevant columns speeds this up:

INDEX(FK_ID, FK_PB,FROM_DATE)

Which performs better because:

  • MySQL can use the index for all the rows for c, so it doesn't need to go back to the table for this (adding a column not in the index would slow it down again a bit).
  • MySQL is pretty bad at index merging, and so often chooses not to use it (luckily), and when it does it's often suboptimal.
  • Well, an index covering all you search for (in this case, join on = searching for) is faster then MySQL electing one of the indexes (the most restrictive one, SHOW INDEX FROM tablename can show you the cardinality) on the separate columns to use and having to scan for the values in the other columns.
Wrikken
  • 69,272
  • 8
  • 97
  • 136