0

I have a data schema/csv like this (cols:id...score) with over 90000 rows and i need ranks as shown - grouped first by year and then by class. Can any please help solve this either in MYSQL or in R

id  Year  class  name    score rank
1   2010  Phy    joe      95    2
2   2010  Phy    amy      98    1
3   2010  Phy    carl     58    3
4   2010  Mat    joe      88    3
5   2010  Mat    amy      100   1
6   2010  Mat    carl     95    2
7   2011  Phy    joe      84    1
8   2011  Phy    amy      25    3
9   2011  Phy    carl     48    2
10  2011  Mat    joe      56    2
11  2011  Mat    amy      85    1
12  2011  Mat    carl     48    3
  • Mysql does not support rank as some RDBMS do, but this might help: http://stackoverflow.com/questions/3333665/mysql-rank-function – John Powell May 26 '14 at 16:46

2 Answers2

1

Assuming your data is stored in R as a data.frame called dd then you can calculate the ranks as you've defined them as

dd$ranks<-with(dd, ave(score, Year, class, FUN=function(x) rank(-x)))

Note that rank has several options for ties, so you might want to read ?rank to see which one is for you.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • excellent - but I just get the rank now. How do I get the whole table + rank? – user3677045 May 26 '14 at 18:18
  • @user3677045 I'm not sure exactly what you ran then. The above code adds a "ranks" column to the `dd` data.frame with `dd$ranks<-` so the new ranks should be added to `dd`. – MrFlick May 26 '14 at 18:23
0

Consider the following...

 SET NAMES utf8;

 DROP TABLE IF EXISTS my_table;

 CREATE TABLE my_table
 (id  INT NOT NULL AUTO_INCREMENT PRIMARY KEY
 ,Year  INT NOT NULL
 ,class  VARCHAR(12) NOT NULL
 ,name    VARCHAR(12) NOT NULL
 ,score INT
 );

 INSERT INTO my_table VALUES
 (1   ,2010  ,'Phy','joe',95),
 (2   ,2010  ,'Phy','amy',      98    ),
 (3   ,2010  ,'Phy','carl',     58    ),
 (4   ,2010  ,'Mat','joe',      88    ),
 (5   ,2010  ,'Mat','amy',      100   ),
 (6   ,2010  ,'Mat','carl',     95    ),
 (7   ,2011  ,'Phy','joe',      84    ),
 (8   ,2011  ,'Phy','amy',      25    ),
 (9   ,2011  ,'Phy','carl',     48    ),
 (10  ,2011  ,'Mat','joe',      56    ),
 (11  ,2011  ,'Mat','amy',      85    ),
 (12  ,2011  ,'Mat','carl',     48    );

1.)

 SELECT x.*
      , COUNT(*) rank 
   FROM my_table x 
   JOIN my_table y 
     ON y.year = x.year 
    AND y.class=x.class 
    AND y.score >= x.score 
  GROUP 
     BY x.year
      , x.class
      , x.score 
  ORDER 
     BY ID;
 +----+------+-------+------+-------+------+
 | id | Year | class | name | score | rank |
 +----+------+-------+------+-------+------+
 |  1 | 2010 | Phy   | joe  |    95 |    2 |
 |  2 | 2010 | Phy   | amy  |    98 |    1 |
 |  3 | 2010 | Phy   | carl |    58 |    3 |
 |  4 | 2010 | Mat   | joe  |    88 |    3 |
 |  5 | 2010 | Mat   | amy  |   100 |    1 |
 |  6 | 2010 | Mat   | carl |    95 |    2 |
 |  7 | 2011 | Phy   | joe  |    84 |    1 |
 |  8 | 2011 | Phy   | amy  |    25 |    3 |
 |  9 | 2011 | Phy   | carl |    48 |    2 |
 | 10 | 2011 | Mat   | joe  |    56 |    2 |
 | 11 | 2011 | Mat   | amy  |    85 |    1 |
 | 12 | 2011 | Mat   | carl |    48 |    3 |
 +----+------+-------+------+-------+------+

2.)

 SELECT id
      , year
      , class
      , name
      , score
      , rank
   FROM 
      ( SELECT x.*
             , IF(@pclass = class,IF(@pyear=year,@i:=@i+1,@i:=1),@i:=1)rank
             , @pyear := year
             , @pclass := class
          FROM my_table  x
             , (SELECT @pyear:='',@pclass:='',@i:=1)vals
         ORDER
            BY year,class,score DESC
      ) m
  ORDER
     BY id;
 +----+------+-------+------+-------+------+
 | id | year | class | name | score | rank |
 +----+------+-------+------+-------+------+
 |  1 | 2010 | Phy   | joe  |    95 |    2 |
 |  2 | 2010 | Phy   | amy  |    98 |    1 |
 |  3 | 2010 | Phy   | carl |    58 |    3 |
 |  4 | 2010 | Mat   | joe  |    88 |    3 |
 |  5 | 2010 | Mat   | amy  |   100 |    1 |
 |  6 | 2010 | Mat   | carl |    95 |    2 |
 |  7 | 2011 | Phy   | joe  |    84 |    1 |
 |  8 | 2011 | Phy   | amy  |    25 |    3 |
 |  9 | 2011 | Phy   | carl |    48 |    2 |
 | 10 | 2011 | Mat   | joe  |    56 |    2 |
 | 11 | 2011 | Mat   | amy  |    85 |    1 |
 | 12 | 2011 | Mat   | carl |    48 |    3 |
 +----+------+-------+------+-------+------+

Suggestion 2 will likely be orders of magnitude faster than suggestion 1

Strawberry
  • 33,750
  • 13
  • 40
  • 57