83

I have problem when executing this code:

SELECT * FROM tblpm n 
WHERE date_updated=(SELECT MAX(date_updated) 
FROM tblpm GROUP BY control_number 
HAVING control_number=n.control_number)

Basically, I want to return the most recent date for each control number. The query above returns correct output but it takes 37secs. before the output was shown.

Is there any other sql clause or command that can execute faster than the query above?

starball
  • 20,030
  • 7
  • 43
  • 238
J-J
  • 1,063
  • 5
  • 22
  • 47

4 Answers4

172

Putting the subquery in the WHERE clause and restricting it to n.control_number means it runs the subquery many times. This is called a correlated subquery, and it's often a performance killer.

It's better to run the subquery once, in the FROM clause, to get the max date per control number.

SELECT n.* 
FROM tblpm n 
INNER JOIN (
  SELECT control_number, MAX(date_updated) AS date_updated
  FROM tblpm GROUP BY control_number
) AS max USING (control_number, date_updated);
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 8
    mysql doc -> http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-row.html – Packet Tracer Jun 17 '14 at 12:28
  • @PacketTracer, thanks for the doc link! And yet, this question has been asked more than 900 times on StackOverflow since I created the [tag:greatest-n-per-group] tag. – Bill Karwin Jun 17 '14 at 19:05
  • 6
    and only 8 votes up on your answer...should be annoying to a percona db master such as you ;), thanks to you, you helped me solve a silly MySQL SP bug a year ago or so. People like you make Stack Overflow probably the best resource for computer nerds. – Packet Tracer Jun 18 '14 at 10:50
  • 6
    @PacketTracer, thank you for the compliment, the points are fun but I'm more concerned with helping people get good answers and be more successful. Besides, this question has been viewed over 9,300 times so far. I think I may have helped a lot of people even if they didn't take the time to give an upvote. – Bill Karwin Jun 18 '14 at 15:10
  • 2
    I am using this query with my own tables and columns but mysql says: Unknown column 'date_updated' in 'from clause' any idea why? – Wikunia Jul 17 '14 at 18:16
  • 1
    @Wikunia, the query above is an example to demonstrate the technique. You would have to replace table names and column names to match those in your own database. Besides that, I can't guess how you have written your query. I suggest you post a new question, and include the output of `SHOW CREATE TABLE` for each of the tables referenced in your query. – Bill Karwin Jul 17 '14 at 18:22
  • 16
    Anyone who's wondering what `USING` does, it's just a substitution to pairing columns with `ON` - so in this example, the snippet `USING (control_number, date_updated)` is the equivalent to `ON max.control_number = n.control_number AND max.date_updated = n.date_updated`. This is a good-to-know when working with Frameworks like Zend, where it uses the latter format instead of the former. – Smithee Aug 10 '18 at 17:59
24

There's no need to group in that subquery... a where clause would suffice:

SELECT * FROM tblpm n
WHERE date_updated=(SELECT MAX(date_updated)
    FROM tblpm WHERE control_number=n.control_number)

Also, do you have an index on the 'date_updated' column? That would certainly help.

Micah Hahn
  • 400
  • 2
  • 11
  • 9
    If the date updated is same (tie), it shows both records when using this solution, so you have to add `GROUP BY n. control_number` – radtek Jul 15 '16 at 02:50
  • 4
    don't use this technique if you need performance, read Bill answer – mrbarletta Feb 06 '20 at 21:46
0

Another way that doesn't use group by:

SELECT * FROM tblpm n 
  WHERE date_updated=(SELECT date_updated FROM tblpm n 
                        ORDER BY date_updated desc LIMIT 1)
  • 2
    This might work, but it was downvoted because it is a horribly inefficient method to do this. – Tony Laidig Mar 28 '18 at 18:19
  • Not sure whether this really is to horrible since we do not know how the compiler deals with this. At least in [this example](https://stackoverflow.com/a/41408193/11154841), when trying to get the max of the count(*) of each "grouped by", it does not seem to have been too slow. The grouping might be guessed by the compiler. Perhaps this answer is as fast as the accepted answer and there is just nobody who has tested it. – questionto42 May 05 '22 at 17:35
-10

Fast and easy with HAVING:

SELECT * FROM tblpm n 
FROM tblpm GROUP BY control_number 
HAVING date_updated=MAX(date_updated);

In the context of HAVING, MAX finds the max of each group. Only the latest entry in each group will satisfy date_updated=max(date_updated). If there's a tie for latest within a group, both will pass the HAVING filter, but GROUP BY means that only one will appear in the returned table.

Phiter
  • 14,570
  • 14
  • 50
  • 84
jaiwithani
  • 13
  • 1
  • actually i fear this does not work as expected, I get an empty result set for this query: `select * from order_order_status where order_order_status.order_id = 1 group by order_order_status.order_id having order_order_status.created_at = max(order_order_status.created_at) ` table has those entries: `INSERT INTO order_order_status (id, order_id, created_at) VALUES (1, 1, '2016-10-01 01:57:37'), (2, 2, '2016-10-01 01:57:54'), (3, 3, '2016-10-02 02:12:49'), (4, 3, '2016-10-02 02:14:19'), (5, 3, '2016-10-02 04:18:07'), (8, 1, '2016-10-03 01:22:53');` – Toskan Oct 03 '16 at 01:32
  • 1
    The max in having clause will give you rows with max date_updated in whole table – zookastos Dec 21 '16 at 22:07