Given a table CREATE TABLE t (id INT PRIMARY KEY, col1 INT, col2 VARCHAR(20));
. How to efficiently find the max N values in col1
grouped by col2
?
As an example, for N=2, I need to write a query to get the RHS table from table t
:
+----+------+------+
| id | col1 | col2 | +----+------+------+
+------------------+ | id | col1 | col2 |
| 1 | 1 | A | +----+------+------+
| 2 | 1 | A | | 1 | 1 | A |
| 3 | 2 | A | --> | 2 | 1 | A |
| 4 | 10 | B | | 3 | 2 | A |
| 5 | 20 | B | | 5 | 20 | B |
| 6 | 30 | B | | 6 | 30 | B |
| 7 | 100 | C | | 7 | 100 | C |
+----+------+------+ +----+------+------+
Table: t Table: query result
For group A, it needs to return all three rows because the max 2 numbers are (1,2) and there are three matches in group A; for group B the max 2 numbers are (20, 30) and there are two matches; group C only have one max value which is 100 so only returning that row is enough.
I got this result by using correlated subquery. The codes are as follows:
select id, col1, col2
from t as t1
where (
select count(distinct t2.col1) from t as t2
where t1.col2 = t2.col2 and t1.col1 < t2.col1
) < 2;
However, as mentioned by this post, this query runs on O(n^2) (n=# of rows). I was wondering could someone teaches me a different technique other than correlated subquery that can run faster? I'm a beginner in MySQL, so it would be great if you can also indicate the name of the technique you are using, or in layman's term how the solution work. Thank you so much!