Related to this question.
Actually lest say we want to solve the grouped ranking problem in mysql. We have a table that each row represents an entity, that belongs to a group. We want to assign a rank for each entity based on an attribute separate for each group. Later we could do various manipulations with the rank, like asking for the first 10 entities of each group that satisfy another condition as well, etc.
For example, the entity could be programmers that belong to different "groups" according to their favorite programming language. Then each programmer has a reputation (let's say in forum). We want to add an extra field that would be the programmer's rank based on descending reputation. We want to do this independently for each group.
gid | repu | name |
1 1 john
1 3 anna
2 2 scot
2 1 leni
to become
gid | repu | name | rank
1 3 anna 1
1 1 john 2
2 2 scot 1
2 1 leni 2
Now let's also demand that we do not want to use session variables based solutions. Yes they work pretty well but they clearly violate the mysql demand for not to read and write a session variable on the same statement. (See here)
Now a proposed solution in this post says
-- SOL #1 (SELF-JOIN)
SELECT a.*, count(*) as row_number FROM test a
JOIN test b ON a.gid = b.gid AND a.repu <= b.repu
GROUP BY a.gid, a.repu
Which pretty much does the thing. Some questions I have are, is this legit SQL or does it violate any standard or mysql quirk? Is is guaranteed that it will work on mysql ?
Also another solution that I read here is, which is more of a black magic for me but seems more elegant
-- SOL #2 (SUBQUERY)
SELECT t.* ,
( SELECT COUNT(*) + 1
FROM test
WHERE repu > t.repu AND gid = t.gid
) AS rank
FROM test AS t
ORDER BY gid ASC, rank ASC
This uses a sub query that references an outer table, and does the trick also. Could anybody explain how this one works ?
Also, the same questions here as for solution #1.
Plus any comments on evaluating the performance/compatibility of two proposed solutions.
EDIT: Aditional methods, for the reference
From this post one variation of the session variable method. WARNING: This is that I want to avoid. Notice that in a single statement that @rand and @partition session variables are read (in the case after WHEN and THEN) and written (in the CASE after THEN AND ELSE and also in the next subquery that initializes the variables).
-- SOL #3 (SESSION VARIABLES / ANTIPATTERN)
SELECT t.*, ( CASE gid
WHEN @partition THEN @rank := @rank + 1
ELSE @rank := 1 AND @partition := gid ) AS rank
FROM test t,
(SELECT @rank := 0, @partition := '') tmp
ORDER BY gid ASC, repu DESC
Also here is the set based solution, rather complicated, posted by a fellow bellow.
-- SOL #4 (SET BASED)
SELECT x.*, FIND_IN_SET(CONCAT(x.gid,':',x.repu), y.c) rank
FROM test x
JOIN (
SELECT GROUP_CONCAT(DISTINCT CONCAT(gid,':',repu) ORDER BY gid, repu DESC) c
FROM test GROUP BY gid
) y ON FIND_IN_SET(CONCAT(x.gid,':',x.repu), y.c)