3

I have 20 classroom, each classroom has 50 students.

is it possible to find 10 best students in each classroom in a single query statement?

The table "klas":

int id
int cla_id
int stu_id
int final_score
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
Kokizzu
  • 24,974
  • 37
  • 137
  • 233

2 Answers2

7

In most databases, you can do:

select k.*
from (select k.*,
             row_number() over (partition by cla_id order by final_score desc) as seqnum
      from klas
     ) k
where seqnum <= 10;

This retrieves 10 students per class with the highest scores. It doesn't work in databases like MySQL and MS Access that do not support ANSI standard window functions like row_number().

The row_number() function applies a sequential numbering to rows. The numbering starts over again for each cla_id in your case -- because of the partition by clause. The row with number "1" is the one with the largest final_score -- because of the order by final_score desc. This can also be applied to aggregation functions like min() and sum(). In that case, it provides the aggregated value, but on every row of the original data set (the equivalent of doing an aggregation and join, but typically more efficiently).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • @KiswonoPrayogo: it's called "window function" and every modern DBMS supports this nowadays. The Postgres manual has a nice tutorial for them: http://www.postgresql.org/docs/current/static/tutorial-window.html –  May 31 '13 at 08:01
2

If the 10th best student has the same score as the 11th or 12th, you don't want to return only number 10, but also 11 and 12. So in that case it's better to use RANK(), like this:

;WITH ranking
   AS
(
    SELECT *
          ,RANK() OVER (PARTITION BY cla_id
                            ORDER BY final_score DESC) AS rank
      FROM klas
)
  SELECT cla_id
        ,rank
        ,stu_id
    FROM ranking
   WHERE rank <= 10
ORDER BY cla_id
        ,rank
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Bliek
  • 466
  • 3
  • 7
  • 1
    `dense_rank()` would even be better (and the `;` should go to the end of the statement, not the beginning for Postgres anyway, but you should get used to the standard way of writing statements even when using MS tools: https://sqlblog.org/2009/09/03/ladies-and-gentlemen-start-your-semi-colons) –  May 31 '13 at 08:03
  • 1
    @a_horse_with_no_name: dense_rank() will return all the students with the 10 highest different scores. That's not what is asked. – Bliek May 31 '13 at 10:58
  • 1
    No it won't. rank() will "skip" values, so if you have the values 12,12,10,9 `rank()` will return 1,1,3,4 whereas `dense_rank()` will return 1,1,2,3. With `rank()` there might not be a "10th" rank. –  May 31 '13 at 11:53
  • @a_horse_with_no_name: Well, what you correctly describe is the reason, why `dense_rank()` would be *wrong*. [We've had this topic on SO before.](http://stackoverflow.com/questions/9629953/postgresql-equivalent-for-top-n-with-ties-limit-with-ties/9630562#9630562) – Erwin Brandstetter May 31 '13 at 12:35
  • Note: the leading `;` is only valid in SQL-Server. The rest works in PostgreSQL just as well. – Erwin Brandstetter May 31 '13 at 12:41
  • @ErwinBrandstetter: it all depends on what exactly "*the 10 best students*" means. Should the query never return more than 10 rows? Or should the query return the students which rank from 1-10 (which might be more than 10 rows). If the first, I agree `rank()` is the correct solution. If the latter, `dense_rank()` is the correct solution. –  May 31 '13 at 12:46
  • @ErwinBrandstetter: the leading `;` is a "hack" for the Microsoft tools to avoid problems because with the Microsoft tools you don't need to use a `;` but the `WITH` keyword can also be used in a different context. For a single statement it is never required and SQL *Server* itself does not require it either, only the "Management Studio" –  May 31 '13 at 12:48
  • @a_horse_with_no_name: For your first case, **`row_number()`** would be the correct solution (like Gordon posted). For the second (which is the best answer here IMO), `rank()` is the correct solution. `dense_rank()` is never correct. It should be used to get "all students that have one of the 10 highest rankings". – Erwin Brandstetter May 31 '13 at 12:48
  • @ErwinBrandstetter: that's exactly what I meant. Or, to put it the other way: if you want to find the student that ranks a place #9 you cannot use `rank()`. It is a matter of interpretation what "the 10 best students*" means –  May 31 '13 at 12:51
  • @a_horse_with_no_name: If you want to find the student that ranks at place #9 you *can always* use `rank()`. Bliek's query returns *at least* 10 students per classroom. More if there are ties for the 10th place. Try it ... – Erwin Brandstetter May 31 '13 at 12:55
  • @ErwinBrandstetter: but with `rank()` there might not be a #9. –  May 31 '13 at 12:59
  • If two students tie for #8, there is no #9. The next student is placed on #10. For the `10 best students` (quoting the question), you have to use `row_number()` or `rank()` (preferably), never `dense_rank()`. – Erwin Brandstetter May 31 '13 at 13:04
  • @a_horse_with_no_name: corrective: `dense_rank()` ... should be used to get "all students that have one of the 10 highest *scores*". I wrote `rankings` above, which is misleading. – Erwin Brandstetter May 31 '13 at 16:20