how to return more than one row per GROUP BY condition

Question

I'm trying to get the TOP X results for a given GROUP BY condition. I'm currently using something like this:

SELECT * FROM 
        (SELECT id  
            FROM myTable 
            WHERE id IN (x1, x2, ..., xn) GROUP BY id ORDER BY grade DESC 
        ) t1 
        INNER JOIN myTable t2 ON t2.id=t1.id

id is a non-unique INT indexed field, with multiple rows per value.

This returns me for each id, the row with the best grade. How can I convert this to return the TOP X results for each id?

For example, for the following data

assuming X from TOP X is 2, I would like to get the rows of:

What if there are `15 13 13` results with `id = 2`? Would you show 3 then? — zerkms, Nov 10 '13 at 08:44
What if there is a tie? For example, 3 records have the same grade for a given id? — Przemyslaw Kruglej, Nov 10 '13 at 08:44
@Przemyslaw Kruglej: what if there is tie on SO members thinking alike in the very same moment? ;-) — zerkms, Nov 10 '13 at 08:45
Not worried about a stable order in case of tie. Just bring one of them. — Noam, Nov 10 '13 at 08:50

DWand · Answer 1 · 2013-11-10T08:55:06.820

Maybe, something like this?

SELECT m.*
FROM (
  SELECT id
  FROM myTable 
  WHERE id IN (1, 3)
  GROUP BY id
) AS ids
RIGHT JOIN myTable AS m ON ids.id = m.id
WHERE
  m.id = ids.id AND
  m.grade IN (
    SELECT TOP 5 t.grade
    FROM myTable AS t
    WHERE t.id = ids.id
    ORDER BY t.grade DESC
  );

UPD: Or, even

SELECT m.*
FROM myTable AS m
WHERE
  m.id IN (1, 2) AND
  m.grade IN (
    SELECT TOP 5 t.grade
    FROM myTable AS t
    WHERE t.id = m.id
    ORDER BY t.grade DESC
  );

score 0 · Answer 2 · answered Nov 10 '13 at 08:51

It depends whether or not you want the ties to be returned, or not.

If you want the ties returned, you can use below approach:

CREATE TABLE grades (
  id INT,
  grade INT
);

INSERT INTO grades VALUES (1, 2);
INSERT INTO grades VALUES (1, 3);
INSERT INTO grades VALUES (1, 4);
INSERT INTO grades VALUES (1, 5);

INSERT INTO grades VALUES (2, 5);
INSERT INTO grades VALUES (2, 5);
INSERT INTO grades VALUES (2, 5);
INSERT INTO grades VALUES (2, 4);

INSERT INTO grades VALUES (3, 3);
INSERT INTO grades VALUES (3, 4);

INSERT INTO grades VALUES (4, 3);

SELECT id, grade
  FROM grades g
WHERE (
   SELECT COUNT(DISTINCT grade) FROM grades
   WHERE id = g.id
      AND grade >= g.grade
) <= 2;

Output:

ID     GRADE
1   4
1   5
2   5
2   5
2   5
2   4
3   3
3   4
4   3

If you do not want the ties, use DISTINCT:

SELECT DISTINCT id, grade
  FROM grades g
WHERE (
   SELECT COUNT(DISTINCT grade) FROM grades
   WHERE id = g.id
      AND grade >= g.grade
) <= 2;

Output:

ID     GRADE
1   4
1   5
2   5
2   4
3   3
3   4
4   3

SQLFiddle: SQLFiddle

Any way to avoid the subquery? Will need to check, but assume this will be a performance killer. This is intended to run on a table with billions of rows. — Noam, Nov 10 '13 at 08:54
@Noam: why not store operational data in a separate table (shard, partition, whatever)? — zerkms, Nov 10 '13 at 08:57
@Noam You could've mentioned it in your question. There is a lot on this topic on google, for example: [TOP X In Mysql #1](http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/) or [TOP X In Mysql #2](http://stackoverflow.com/questions/12113699/get-top-n-records-for-each-group-of-grouped-results) — Przemyslaw Kruglej, Nov 10 '13 at 08:57

how to return more than one row per GROUP BY condition

2 Answers2