MySQL GROUP BY and HAVING

Question

I'm grouping my results based on a column X and I want to return the rows that has highest Column Y's value in the group.

SELECT * 
FROM   mytable 
GROUP  BY col1 
HAVING col2 >= (SELECT MAX(col2) 
                FROM   mytable AS mytable2 
                WHERE  mytable2.col1 = mytable.col1 GROUP BY mytable2.col1)

I want to optimize the query above. Is it doable without sub-queries?

I found the solution and it's simpler than you think:

SELECT * FROM (SELECT * FROM mytable ORDER BY col2 DESC) temp GROUP BY col1

Runs in 5 milliseconds on 20,000 rows.

Are you under the impression that a subquery is suboptimal? How do you think the db engine would carry out the request, if there were another way to express it? — harpo, Feb 25 '11 at 01:27

OMG Ponies · Answer 1 · 2011-02-25T02:06:53.860

13

Using a derived table/inline view for a JOIN:

SELECT x.* 
  FROM mytable x
  JOIN (SELECT t.col1,
               MAX(t.col2) AS max_col2
          FROM MYTABLE t
      GROUP BY t.col1) y ON y.col1 = x.col1
                        AND y.max_col2 >= x.col2

Be aware that this will duplicate x records if there's more than one related y record. To remove duplicates, use DISTINCT:

SELECT DISTINCT x.* 
  FROM mytable x
  JOIN (SELECT t.col1,
               MAX(t.col2) AS max_col2
          FROM MYTABLE t
      GROUP BY t.col1) y ON y.col1 = x.col1
                        AND y.max_col2 >= x.col2

The following is untested, but will not return duplicates (assuming valid):

SELECT x.* 
  FROM mytable x
 WHERE EXISTS (SELECT NULL
                 FROM MYTABLE y
                WHERE y.col1 = x.col1
             GROUP BY y.col1
               HAVING MAX(y.col2) >= x.col2)

edited Feb 25 '11 at 02:06

answered Feb 25 '11 at 01:34

OMG Ponies

325,700
82
523
502

I feel like there's a LIMIT 1 solution in there somewhere – Conrad Frix Feb 25 '11 at 01:48
Isn't the first query wrong? I would think that the inline view would have only one row (thus not completely joining to `x`). And even then, if you grouped by col1 instead, it would only have the max for each row-specific col1 – Matt Feb 25 '11 at 01:50
The first is returning duplicates (not useful), second is taking too long to run (3 seconds on 20,000 rows). – Elie Feb 25 '11 at 01:53
@Matt: No, the inline view will return rows for each col1 value. The MAX is identical to the original query supplied. – OMG Ponies Feb 25 '11 at 01:55
2

@Elie: Did you downvote me? I don't have your tables or data to test speed. It's up to you to index the tables. – OMG Ponies Feb 25 '11 at 01:57
@OMG Ponies - yep I misunderstood the question. I thought @Elie was trying to pull all rows who had a col2 equal to the MAX(col2) for the whole _table_. – Matt Feb 25 '11 at 01:57
@OMG Ponies: I didn't down vote you but didn't up vote you neither... The query I provided still runs faster than yours (around half a second). I'm still looking for a better optimization (thanks for your input btw) – Elie Feb 25 '11 at 02:06
@Elie: Your update relies on MySQL -- it won't port to most databases (SQLite is the only one I can think of that will allow it). – OMG Ponies Feb 25 '11 at 02:55

score 1 · Answer 2 · answered Mar 11 '14 at 14:24

1

Your Col2 never be > then MAX(col2) so i suggest to use col2 = MAX(col2)

so HERE is the QUERY

SELECT * FROM  mytable GROUP BY col1 HAVING  col2 = MAX(  col2 )

answered Mar 11 '14 at 14:24

Fisherman

5,949
1
29
35

MySQL GROUP BY and HAVING

2 Answers2