1

I was wondering if someone can assist me with the following. Every business can have multiple notes. I need to get ONLY the notes with the most recent date, (one) for every business. This is what I tried, however this seems to produce incorrect results:

SELECT * FROM note n
GROUP BY business_id
HAVING MAX(`time`)

Is it possible to accomplish this without the use of a subquery?

I appreciate any suggestions, thanks in advance!

AnchovyLegend
  • 12,139
  • 38
  • 147
  • 231
  • Similiar: http://stackoverflow.com/questions/2129693/mysql-using-limit-within-group-by-to-get-n-results-per-group – Daniel W. Jan 08 '14 at 15:38
  • The manual provides a solution, but what's wrong with subqueries? – Strawberry Jan 08 '14 at 15:39
  • Because providing the last note for every business is part of the application requirements that I must fulfill. – AnchovyLegend Jan 08 '14 at 15:40
  • Strawberry, I am working with a very large dataset. A subquery approach will work as well, if it is not horrible in performance. I was just wondering if it is possible to accomplish this without one, but any solution will work really. – AnchovyLegend Jan 08 '14 at 15:43
  • Both methods provided by Gordon are likely to be orders of magnitude faster than any 'no subquery' approach. Appropriately indexed, they *should* both be fast, but for aesthetic reasons I prefer the uncorrelated subquery (or 'traditional') method. – Strawberry Jan 09 '14 at 09:56

2 Answers2

2

Here is one method that might be relatively efficient with an index on note(business_id, time):

SELECT n.*
FROM note n
WHERE not exists (select 1
                  from note n2
                  where n2.business_id = n.business_id and
                        n2.`time` > n.`time
                 );

The traditional method also works:

select n.*
from note n join
     (select business_id, max(time) as maxtime
      from note
      group by business_id
     ) nmax
     on n.business_id = nmax.business_id and n.`time` = nmax.maxtime;

But it has to do the aggregation as a separate step. The first version should just be able to resolve the where clause using the index.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for the reply Gordon. This may work but I have to say, wow, performance is horrible. 1 minute + and it is still not done querying (dataset is large 50k+). I don't think I can use this. Thanks anyways. – AnchovyLegend Jan 08 '14 at 15:46
  • The traditional method from your above approaches is much better in terms of performance, will test and post back, I appreciate the help! – AnchovyLegend Jan 08 '14 at 15:49
  • @AnchovyLegend . . . That is interesting. Are you sure you have the index defined as specified? – Gordon Linoff Jan 08 '14 at 15:55
  • +1, ended up using the traditional method. Thank you! – AnchovyLegend Jan 08 '14 at 18:59
  • @AnchovyLegend . . . By the way, thank you for the feedback on performance. I had thought the first method would be better with the appropriate index. – Gordon Linoff Jan 08 '14 at 21:14
0

You could use a LEFT JOIN:

SELECT n.*
FROM
  note n LEFT JOIN note n1
  ON n.business_id=n1.business_id
     AND n.`time` < n1.`time`
WHERE
  n1.business_id IS NULL
fthiella
  • 48,073
  • 15
  • 90
  • 106
  • Thanks for the reply. This may work, however, it is horrible with performance, I cannot use it. – AnchovyLegend Jan 08 '14 at 15:50
  • @AnchovyLegend do you have any index on your table? but yes, subqueries are not always slower, sometimes they might perform better than a join... – fthiella Jan 08 '14 at 15:52
  • Of course, the table is indexed. Please edit and add a subquery solution, if you have one. I appreciate the help. – AnchovyLegend Jan 08 '14 at 15:55
  • This *is* the classic solution. I was once such a strong advocate of it that it's even referred to sometimes as the 'strawberry method/query' - although I certainly had nothing to do with its invention! But times change. The uncorrelated subquery method is almost always fast*er* (but not always fast*est*) – Strawberry Jan 09 '14 at 10:03