1

Given a table CREATE TABLE t (id INT PRIMARY KEY, col1 INT, col2 VARCHAR(20));. How to efficiently find the max N values in col1 grouped by col2?

As an example, for N=2, I need to write a query to get the RHS table from table t:

+----+------+------+           
| id | col1 | col2 |         +----+------+------+
+------------------+         | id | col1 | col2 |
|  1 |    1 |  A   |         +----+------+------+
|  2 |    1 |  A   |         |  1 |    1 | A    |
|  3 |    2 |  A   |   -->   |  2 |    1 | A    |
|  4 |   10 |  B   |         |  3 |    2 | A    |
|  5 |   20 |  B   |         |  5 |   20 | B    |
|  6 |   30 |  B   |         |  6 |   30 | B    |
|  7 |  100 |  C   |         |  7 |  100 | C    |
+----+------+------+         +----+------+------+

      Table: t                Table: query result

For group A, it needs to return all three rows because the max 2 numbers are (1,2) and there are three matches in group A; for group B the max 2 numbers are (20, 30) and there are two matches; group C only have one max value which is 100 so only returning that row is enough.

I got this result by using correlated subquery. The codes are as follows:

select id, col1, col2 
from t as t1
where (
       select count(distinct t2.col1) from t as t2
       where t1.col2 = t2.col2 and t1.col1 < t2.col1
) < 2;

However, as mentioned by this post, this query runs on O(n^2) (n=# of rows). I was wondering could someone teaches me a different technique other than correlated subquery that can run faster? I'm a beginner in MySQL, so it would be great if you can also indicate the name of the technique you are using, or in layman's term how the solution work. Thank you so much!

Smile
  • 113
  • 1
  • 7
  • 1
    Just about all the methods are described in the post I linked to. If you're using a recent version of MySQL, window functions are probably the best way. – Barmar Nov 11 '20 at 05:12
  • in particular, if you can use windows functions (requires mysql 8+ or mariadb 10.2+), look at the row_number() and rank() examples (only you want dense_rank instead IIUC); otherwise look at the group_concat example (only you want `group_concat(distinct ...)` to be the equivalent of dense_rank, not row_number) – ysth Nov 11 '20 at 17:56

0 Answers0