1

I have a question about sql. I have a question looks like this.

 +----+-------+
 | Id | Score |
 +----+-------+
 | 1  | 3.50  |
 | 2  | 3.65  |
 | 3  | 4.00  |
 | 4  | 3.85  |
 | 5  | 4.00  |
 | 6  | 3.65  |
 +----+-------+

The table is called 'Scores' and after ranking the score here, it will look like this,

 +-------+------+
 | Score | Rank |
 +-------+------+
 | 4.00  | 1    |
 | 4.00  | 1    |
 | 3.85  | 2    |
 | 3.65  | 3    |
 | 3.65  | 3    |
 | 3.50  | 4    |
 +-------+------+

Here is a sample answer but I am confused about the part after WHERE.

select 
    s.Score, 
    (select count(distinct Score) from Scores where Score >= s.Score) 
 Rank
 from Scores s
 order by s.Score Desc;

This Score >= s.Score is something like Score column compare with itself. I totally feel confused about this part. How does it work? Thank you!

E.

Eleanor
  • 2,647
  • 5
  • 18
  • 30

3 Answers3

2

One way to understand this is to just run the query for each row of your sample data. Starting with the first row, we see that the score is 4.00. The correlated subquery in the select clause:

(select count(distinct Score) from Scores where Score >= s.Score)

will return a count of 1, because there is only one record whose distinct score is greater than or equal to 4.00. This is also the case for the second record in your data, which has a score of 4.00 as well. For the score 3.85, the subquery would find a distinct count of 2, because there are two scores which are greater than or equal to 3.85, namely 3.85 and 4.00. You can apply this logic across the whole table to convince yourself of how the query works.

+-------+------+
| Score | Rank |
+-------+------+
| 4.00  | 1    | <-- 1 score >= 4.00   
| 4.00  | 1    | <-- 1 score >= 4.00
| 3.85  | 2    | <-- 2 scores >= 3.85
| 3.65  | 3    | <-- 3 scores >= 3.65
| 3.65  | 3    | <-- 3 scores >= 3.65
| 3.50  | 4    | <-- 4 scores >= 3.50
+-------+------+
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
1

This is known as a dependent subquery (and can be quite inefficient). A dependent subquery - basically means it cannot be turned into a join because it "depends" on a specific value - runs for every result row in the output for the specific "dependent" values. In this case each result-row already has a "specific" value of s.Score.

The 'Score' in the dependent subquery refers to the original table and not the outer query.

It may be more clear with an additional alias:

select 
  s.Score, 
  (select count(distinct other_scores.Score)
   from Scores other_scores
   where other_scores.Score >= s.Score) Rank -- value of s.Score is known
                                             -- and placed directly into dependent subquery
from Scores s
order by s.Score Desc;

"Modern" SQL dialects (including MySQL 8.0+) provide "RANK" and "DENSE_RANK" Window Functions to answer these sorts of queries. Window Functions, where applicable, are often much faster than dependent queries because the Query Planner can optimize at a higher level: these functions also have a tendency to tame otherwise gnarly SQL.

The MySQL 8+ SQL Syntax that ought to do the trick:

select 
  s.Score, 
  DENSE_RANK() over w AS Rank
from Scores s
window w as (order by Score desc)

There are also various work-abouts to emulate ROW_NUMBER / Window Functions for older versions of MySQL.

user2864740
  • 60,010
  • 15
  • 145
  • 220
  • @TimBiegeleisen Not it :} – user2864740 Dec 25 '17 at 03:35
  • What is an efficient way to do this? Any examples or key words that I can search in google? i don't even know what to search. lol – Eleanor Dec 28 '17 at 22:23
  • @Eleanor See https://stackoverflow.com/questions/1895110/row-number-in-mysql - note that *MySQL 8.0* adds as `RANK` function. The answers there also show alternative methods for earlier MySQL versions. (This shows I've not used MySQL in years..!) – user2864740 Dec 28 '17 at 22:35
  • @Eleanor If using MySQL 8+, I would recommend using the new [Window Function](https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html) support and "DENSE_RANK". I've also updated the answer to include the link, and to reflect that MySQL has finally fixed this coverage gap.. – user2864740 Dec 28 '17 at 22:41
  • @Eleanor I've "copied" the syntax from the link into the question and modified it slightly for the table, although I have not verified it - I'm not a MySQL user. YMMV. :) – user2864740 Dec 28 '17 at 22:45
0

Because it is dependent subquery. Every subquery will need to be re-evaluate on each row from outter query. If you familiar with Python, you can think of it like this:

from collections import namedtuple
ScoreTuple = namedtuple('ScoreTuple', ['Id', 'Score'])
Scores = [ScoreTuple(1, 3.50), 
          ScoreTuple(2, 3.65), 
          ScoreTuple(3, 4.00), 
          ScoreTuple(4, 3.85),
          ScoreTuple(5, 4.00),
          ScoreTuple(6, 3.65)]
Rank = []
for s in Scores: # each row from outter query
    rank = len(set([innerScore.Score                  # SELECT COUNT(DISTINCT Score)
                    for innerScore in Scores          # FROM Scores 
                    if innerScore.Score >= s.Score])) # WHERE Score >= s.Score
    Rank.append(rank)