16

I'm grouping my results based on a column X and I want to return the rows that has highest Column Y's value in the group.

SELECT * 
FROM   mytable 
GROUP  BY col1 
HAVING col2 >= (SELECT MAX(col2) 
                FROM   mytable AS mytable2 
                WHERE  mytable2.col1 = mytable.col1 GROUP BY mytable2.col1) 

I want to optimize the query above. Is it doable without sub-queries?

I found the solution and it's simpler than you think:

SELECT * FROM (SELECT * FROM mytable ORDER BY col2 DESC) temp GROUP BY col1

Runs in 5 milliseconds on 20,000 rows.

Elie
  • 6,915
  • 7
  • 31
  • 35
  • 3
    Are you under the impression that a subquery is suboptimal? How do you think the db engine would carry out the request, if there were another way to express it? – harpo Feb 25 '11 at 01:27
  • 3
    @harpo: True, but this example is a correlated subquery. – OMG Ponies Feb 25 '11 at 01:34

2 Answers2

13

Using a derived table/inline view for a JOIN:

SELECT x.* 
  FROM mytable x
  JOIN (SELECT t.col1,
               MAX(t.col2) AS max_col2
          FROM MYTABLE t
      GROUP BY t.col1) y ON y.col1 = x.col1
                        AND y.max_col2 >= x.col2

Be aware that this will duplicate x records if there's more than one related y record. To remove duplicates, use DISTINCT:

SELECT DISTINCT x.* 
  FROM mytable x
  JOIN (SELECT t.col1,
               MAX(t.col2) AS max_col2
          FROM MYTABLE t
      GROUP BY t.col1) y ON y.col1 = x.col1
                        AND y.max_col2 >= x.col2

The following is untested, but will not return duplicates (assuming valid):

SELECT x.* 
  FROM mytable x
 WHERE EXISTS (SELECT NULL
                 FROM MYTABLE y
                WHERE y.col1 = x.col1
             GROUP BY y.col1
               HAVING MAX(y.col2) >= x.col2)
OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
  • I feel like there's a LIMIT 1 solution in there somewhere – Conrad Frix Feb 25 '11 at 01:48
  • Isn't the first query wrong? I would think that the inline view would have only one row (thus not completely joining to `x`). And even then, if you grouped by col1 instead, it would only have the max for each row-specific col1 – Matt Feb 25 '11 at 01:50
  • The first is returning duplicates (not useful), second is taking too long to run (3 seconds on 20,000 rows). – Elie Feb 25 '11 at 01:53
  • @Matt: No, the inline view will return rows for each col1 value. The MAX is identical to the original query supplied. – OMG Ponies Feb 25 '11 at 01:55
  • 2
    @Elie: Did you downvote me? I don't have your tables or data to test speed. It's up to you to index the tables. – OMG Ponies Feb 25 '11 at 01:57
  • @OMG Ponies - yep I misunderstood the question. I thought @Elie was trying to pull all rows who had a col2 equal to the MAX(col2) for the whole _table_. – Matt Feb 25 '11 at 01:57
  • @OMG Ponies: I didn't down vote you but didn't up vote you neither... The query I provided still runs faster than yours (around half a second). I'm still looking for a better optimization (thanks for your input btw) – Elie Feb 25 '11 at 02:06
  • @Elie: Your update relies on MySQL -- it won't port to most databases (SQLite is the only one I can think of that will allow it). – OMG Ponies Feb 25 '11 at 02:55
1

Your Col2 never be > then MAX(col2) so i suggest to use col2 = MAX(col2)

so HERE is the QUERY

SELECT * FROM  mytable GROUP BY col1 HAVING  col2 = MAX(  col2 ) 
Fisherman
  • 5,949
  • 1
  • 29
  • 35