Group by having max time without subquery

Question

I was wondering if someone can assist me with the following. Every business can have multiple notes. I need to get ONLY the notes with the most recent date, (one) for every business. This is what I tried, however this seems to produce incorrect results:

SELECT * FROM note n
GROUP BY business_id
HAVING MAX(`time`)

Is it possible to accomplish this without the use of a subquery?

I appreciate any suggestions, thanks in advance!

Similiar: http://stackoverflow.com/questions/2129693/mysql-using-limit-within-group-by-to-get-n-results-per-group — Daniel W., Jan 08 '14 at 15:38
The manual provides a solution, but what's wrong with subqueries? — Strawberry, Jan 08 '14 at 15:39
Because providing the last note for every business is part of the application requirements that I must fulfill. — AnchovyLegend, Jan 08 '14 at 15:40
Strawberry, I am working with a very large dataset. A subquery approach will work as well, if it is not horrible in performance. I was just wondering if it is possible to accomplish this without one, but any solution will work really. — AnchovyLegend, Jan 08 '14 at 15:43
Both methods provided by Gordon are likely to be orders of magnitude faster than any 'no subquery' approach. Appropriately indexed, they *should* both be fast, but for aesthetic reasons I prefer the uncorrelated subquery (or 'traditional') method. — Strawberry, Jan 09 '14 at 09:56

score 2 · Accepted Answer · answered Jan 08 '14 at 15:42

2

Here is one method that might be relatively efficient with an index on note(business_id, time):

SELECT n.*
FROM note n
WHERE not exists (select 1
                  from note n2
                  where n2.business_id = n.business_id and
                        n2.`time` > n.`time
                 );

The traditional method also works:

select n.*
from note n join
     (select business_id, max(time) as maxtime
      from note
      group by business_id
     ) nmax
     on n.business_id = nmax.business_id and n.`time` = nmax.maxtime;

But it has to do the aggregation as a separate step. The first version should just be able to resolve the where clause using the index.

answered Jan 08 '14 at 15:42

Gordon Linoff

1,242,037
58
646
786

Thanks for the reply Gordon. This may work but I have to say, wow, performance is horrible. 1 minute + and it is still not done querying (dataset is large 50k+). I don't think I can use this. Thanks anyways. – AnchovyLegend Jan 08 '14 at 15:46
The traditional method from your above approaches is much better in terms of performance, will test and post back, I appreciate the help! – AnchovyLegend Jan 08 '14 at 15:49
@AnchovyLegend . . . That is interesting. Are you sure you have the index defined as specified? – Gordon Linoff Jan 08 '14 at 15:55
+1, ended up using the traditional method. Thank you! – AnchovyLegend Jan 08 '14 at 18:59
@AnchovyLegend . . . By the way, thank you for the feedback on performance. I had thought the first method would be better with the appropriate index. – Gordon Linoff Jan 08 '14 at 21:14

score 0 · Answer 2 · answered Jan 08 '14 at 15:46

0

You could use a LEFT JOIN:

SELECT n.*
FROM
  note n LEFT JOIN note n1
  ON n.business_id=n1.business_id
     AND n.`time` < n1.`time`
WHERE
  n1.business_id IS NULL

answered Jan 08 '14 at 15:46

fthiella

48,073
15
90
106

Thanks for the reply. This may work, however, it is horrible with performance, I cannot use it. – AnchovyLegend Jan 08 '14 at 15:50
@AnchovyLegend do you have any index on your table? but yes, subqueries are not always slower, sometimes they might perform better than a join... – fthiella Jan 08 '14 at 15:52
Of course, the table is indexed. Please edit and add a subquery solution, if you have one. I appreciate the help. – AnchovyLegend Jan 08 '14 at 15:55
This *is* the classic solution. I was once such a strong advocate of it that it's even referred to sometimes as the 'strawberry method/query' - although I certainly had nothing to do with its invention! But times change. The uncorrelated subquery method is almost always fast*er* (but not always fast*est*) – Strawberry Jan 09 '14 at 10:03

Group by having max time without subquery

2 Answers2