4

Related to this question.

Actually lest say we want to solve the grouped ranking problem in mysql. We have a table that each row represents an entity, that belongs to a group. We want to assign a rank for each entity based on an attribute separate for each group. Later we could do various manipulations with the rank, like asking for the first 10 entities of each group that satisfy another condition as well, etc.

For example, the entity could be programmers that belong to different "groups" according to their favorite programming language. Then each programmer has a reputation (let's say in forum). We want to add an extra field that would be the programmer's rank based on descending reputation. We want to do this independently for each group.

gid | repu | name |
1       1    john
1       3    anna
2       2    scot
2       1    leni

to become

gid | repu | name | rank
1       3    anna      1
1       1    john      2
2       2    scot      1
2       1    leni      2

Now let's also demand that we do not want to use session variables based solutions. Yes they work pretty well but they clearly violate the mysql demand for not to read and write a session variable on the same statement. (See here)

Now a proposed solution in this post says

-- SOL #1 (SELF-JOIN)
SELECT a.*, count(*) as row_number FROM test a
JOIN test b ON a.gid = b.gid AND a.repu <= b.repu
GROUP BY a.gid, a.repu

Which pretty much does the thing. Some questions I have are, is this legit SQL or does it violate any standard or mysql quirk? Is is guaranteed that it will work on mysql ?

Also another solution that I read here is, which is more of a black magic for me but seems more elegant

-- SOL #2 (SUBQUERY)
SELECT t.* ,
    ( SELECT COUNT(*) + 1
        FROM test
        WHERE repu > t.repu AND gid = t.gid 
    ) AS rank
FROM test AS t
ORDER BY gid ASC, rank ASC  

This uses a sub query that references an outer table, and does the trick also. Could anybody explain how this one works ?

Also, the same questions here as for solution #1.

Plus any comments on evaluating the performance/compatibility of two proposed solutions.

EDIT: Aditional methods, for the reference

From this post one variation of the session variable method. WARNING: This is that I want to avoid. Notice that in a single statement that @rand and @partition session variables are read (in the case after WHEN and THEN) and written (in the CASE after THEN AND ELSE and also in the next subquery that initializes the variables).

-- SOL #3 (SESSION VARIABLES / ANTIPATTERN)
SELECT t.*, ( CASE gid
             WHEN @partition THEN @rank := @rank + 1 
             ELSE @rank := 1 AND @partition := gid ) AS rank
FROM test t, 
    (SELECT @rank := 0, @partition := '') tmp
ORDER BY gid ASC, repu DESC

Also here is the set based solution, rather complicated, posted by a fellow bellow.

-- SOL #4 (SET BASED)
SELECT x.*, FIND_IN_SET(CONCAT(x.gid,':',x.repu), y.c) rank 
    FROM test x 
    JOIN (
        SELECT GROUP_CONCAT(DISTINCT CONCAT(gid,':',repu) ORDER BY gid, repu DESC) c 
        FROM test GROUP BY gid
    ) y ON FIND_IN_SET(CONCAT(x.gid,':',x.repu), y.c)
Community
  • 1
  • 1
basos
  • 578
  • 4
  • 11
  • AFAIK, ALL the method's you've described are legit. My guess is that the variables method will be fastest, followed by a method you haven't described, followed by the subquery, followed by the join - but I'm only guessing. – Strawberry Feb 01 '13 at 17:34
  • Could you share the method that I haven't defined, please? – basos Feb 01 '13 at 17:35
  • 1
    I will if I remember it ... :-( – Strawberry Feb 01 '13 at 17:47
  • I think it looks like this - but now that I come to write it down I can't see how it could possibly be faster than the other methods described... `SELECT x.gid, x.repu, x.name, FIND_IN_SET(CONCAT(x.gid,':',x.repu),y.c) rank FROM test x JOIN (SELECT GROUP_CONCAT(DISTINCT CONCAT(gid,':',repu) ORDER BY gid,repu DESC) c FROM test GROUP BY gid) y ON FIND_IN_SET(CONCAT(x.gid,':',x.repu),y.c);` – Strawberry Feb 01 '13 at 17:57
  • You should understand that MySQL recommendations, are just recommendations, but if you have a table with groups where the number of items within a group is rather large, you can wait forever until the valid query finishes: see http://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group/8757062#8757062 – newtover Feb 01 '13 at 17:58
  • The warning against using session variables doesn't affect this solution. The warning is about reading the variable is a *separate expression* and expecting a certain order of execution of the expressions. But the MySQL solution to simulating ranking using session variables does not use more than one expression, so it's safe to use that solution. – Bill Karwin Feb 01 '13 at 18:02
  • @BillKarwin But is there a guarantee that it will work as expected (even with optimizer improvements) in future MySQL versions? Will it work in 5.6? – ypercubeᵀᴹ Feb 02 '13 at 11:04
  • @ypercube, I just tested it with 5.6.9 and it gives the same result as the other solutions. I can see no reason why the semantics would ever change. On the other hand, there's never any *guarantee* that features will never change in *any* future version of any software. – Bill Karwin Feb 02 '13 at 20:26

1 Answers1

0

JOIN is legit MYSQL syntax. If it wasn't working, doubt anyone would mark it as the answer.

In terms of subquery, it will be less faster than the first solution. Looking at EXPLAIN PLAN would be a great idea to understand the execution of these queries.

There's another way to achieve the same:-

  • -- SOL #3: Answer with 30 votes in this post:

ROW_NUMBER() in MySQL

Community
  • 1
  • 1
bonCodigo
  • 14,268
  • 1
  • 48
  • 91
  • Actually my concern about join solution is if the group by can make any rows accidentally consolidated. To be honest the GROUP BY a.repu confuses me. Why would we need to group by the value that we are comparing? I have not encountered it before. Usually GROUP BY is used for columns that represent a kind of category. – basos Feb 01 '13 at 17:43
  • About SOL#3 this is the session variable based one, that despite the 30 upvotes is against mysql recommendations. – basos Feb 01 '13 at 17:44