3

I am trying to optimize a query on a mysql table I've created. I expect that there will be many many rows in the table. Looking at this question the accepted answer and the top voted answer suggests two different approaches. I wrote these two queries and want to know which one is more performant.

SELECT uv.* 
FROM UserVisit uv INNER JOIN 
    (SELECT ID,MAX(visitDate) visitDate 
     FROM UserVisit GROUP BY ID) last 
    ON (uv.ID = last.ID AND uv.visitDate = last.visitDate);

Running this with EXPLAIN yields:

+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref                            | rows | Extra       |
+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL          | NULL    | NULL    | NULL                           |    2 |             |
|  1 | PRIMARY     | uv         | eq_ref | PRIMARY       | PRIMARY | 11      | last.playscanID,last.visitDate |    1 |             |
|  2 | DERIVED     | UserVisit  | index  | NULL          | PRIMARY | 11      | NULL                           |    4 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
3 rows in set (0.01 sec)

And the other query:

SELECT lastVisits.* 
FROM ( SELECT * FROM UserVisit ORDER BY visitDate DESC ) lastVisits 
GROUP BY lastVisits.ID

Running that with EXPLAIN yields:

+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL |    4 | Using temporary; Using filesort |
|  2 | DERIVED     | UserVisit  | ALL  | NULL          | NULL | NULL    | NULL |    4 | Using filesort                  |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
2 rows in set (0.00 sec)

I am uncertain how to interpret the result of the two EXPLAINs.
Which of these queries can I expect to be faster and why?

EDIT: This is the way UserVisit table looks:

+----------------+---------------------+------+-----+---------+-------+
| Field          | Type                | Null | Key | Default | Extra |
+----------------+---------------------+------+-----+---------+-------+
| ID             | bigint(20) unsigned | NO   | PRI | NULL    |       |
| visitDate      | date                | NO   | PRI | NULL    |       |
| visitTime      | time                | NO   |     | NULL    |       |
| analysisResult | decimal(3,2)        | NO   |     | NULL    |       |
+----------------+---------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
Community
  • 1
  • 1
Emil L
  • 20,219
  • 3
  • 44
  • 65
  • Technically, the second query cannot reliably return the answer you seek. In practice, it works in all versions (that support subqueries) so far. Also in practice, it rarely outperforms the 1st query. – Strawberry Mar 04 '14 at 10:09
  • @Strawberry Could you elaborate on why the second query is unable to reliably return the correct answer? Is it because there is no guarantee that the top row will be selected by the `GROUP BY`? Also, could you explain why there would be little difference in performance for the querries? – Emil L Mar 04 '14 at 10:22
  • 1
    Precisely - although, as I say, in practice it ALWAYS does - and (being devil's advocate for a minute) Neville K's argument is somewhat spurious in that all kinds of things could change in future versions. I prefer to stick to the line that the second solution is undocumented (unless you count the comments section of the relevant page in the manual). I don't know enough (read "anything") about the mechanics of MySQL to say why performance differences will be negligible, but I've tested both queries on vast (indexed) tables. Definitely agree with NK about the compound key though. – Strawberry Mar 04 '14 at 10:30

1 Answers1

2

Firstly, you might want to read the manual on EXPLAIN. It's a dense read, but it should provide most of the information you want.

Secondly, as Strawberry says, the second query works by accident. The behaviour may change in future versions, and your query would not return an error, just different data. That's nearly always a bad thing.

Finally, the EXPLAIN suggests that version 1 will be faster. In EXTRA, it's saying it's using an index, which is much faster than filesort. Without a schema, it's hard to be sure, but I think you will also benefit from a compound key on ID and visitdate.

Sebastian Brosch
  • 42,106
  • 15
  • 72
  • 87
Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52