0

I'm trying to get the TOP X results for a given GROUP BY condition. I'm currently using something like this:

SELECT * FROM 
        (SELECT id  
            FROM myTable 
            WHERE id IN (x1, x2, ..., xn) GROUP BY id ORDER BY grade DESC 
        ) t1 
        INNER JOIN myTable t2 ON t2.id=t1.id

id is a non-unique INT indexed field, with multiple rows per value.

This returns me for each id, the row with the best grade. How can I convert this to return the TOP X results for each id?

For example, for the following data

id  grade
2   10
2   13
2   15
3   20
4   16
4   55
4   45
4   35
4   25
5   1

assuming X from TOP X is 2, I would like to get the rows of:

id grade  
2  15  
2  13  
3  20  
4  50  
4  40  
5  1  
Andriy M
  • 76,112
  • 17
  • 94
  • 154
Noam
  • 3,341
  • 4
  • 35
  • 64

2 Answers2

0

Maybe, something like this?

SELECT m.*
FROM (
  SELECT id
  FROM myTable 
  WHERE id IN (1, 3)
  GROUP BY id
) AS ids
RIGHT JOIN myTable AS m ON ids.id = m.id
WHERE
  m.id = ids.id AND
  m.grade IN (
    SELECT TOP 5 t.grade
    FROM myTable AS t
    WHERE t.id = ids.id
    ORDER BY t.grade DESC
  );

UPD: Or, even

SELECT m.*
FROM myTable AS m
WHERE
  m.id IN (1, 2) AND
  m.grade IN (
    SELECT TOP 5 t.grade
    FROM myTable AS t
    WHERE t.id = m.id
    ORDER BY t.grade DESC
  );
DWand
  • 674
  • 7
  • 16
0

It depends whether or not you want the ties to be returned, or not.

If you want the ties returned, you can use below approach:

CREATE TABLE grades (
  id INT,
  grade INT
);

INSERT INTO grades VALUES (1, 2);
INSERT INTO grades VALUES (1, 3);
INSERT INTO grades VALUES (1, 4);
INSERT INTO grades VALUES (1, 5);

INSERT INTO grades VALUES (2, 5);
INSERT INTO grades VALUES (2, 5);
INSERT INTO grades VALUES (2, 5);
INSERT INTO grades VALUES (2, 4);

INSERT INTO grades VALUES (3, 3);
INSERT INTO grades VALUES (3, 4);

INSERT INTO grades VALUES (4, 3);

SELECT id, grade
  FROM grades g
WHERE (
   SELECT COUNT(DISTINCT grade) FROM grades
   WHERE id = g.id
      AND grade >= g.grade
) <= 2;

Output:

ID     GRADE
1   4
1   5
2   5
2   5
2   5
2   4
3   3
3   4
4   3

If you do not want the ties, use DISTINCT:

SELECT DISTINCT id, grade
  FROM grades g
WHERE (
   SELECT COUNT(DISTINCT grade) FROM grades
   WHERE id = g.id
      AND grade >= g.grade
) <= 2;

Output:

ID     GRADE
1   4
1   5
2   5
2   4
3   3
3   4
4   3

SQLFiddle: SQLFiddle

Przemyslaw Kruglej
  • 8,003
  • 2
  • 26
  • 41
  • Any way to avoid the subquery? Will need to check, but assume this will be a performance killer. This is intended to run on a table with billions of rows. – Noam Nov 10 '13 at 08:54
  • @Noam: why not store operational data in a separate table (shard, partition, whatever)? – zerkms Nov 10 '13 at 08:57
  • 1
    @Noam You could've mentioned it in your question. There is a lot on this topic on google, for example: [TOP X In Mysql #1](http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/) or [TOP X In Mysql #2](http://stackoverflow.com/questions/12113699/get-top-n-records-for-each-group-of-grouped-results) – Przemyslaw Kruglej Nov 10 '13 at 08:57