2

This is a question from Stanford online database course exercise. Find the movie(s) with the highest average rating. Return these movie title(s) and their average rating. Use SQLite.

I've seen solutions kindly suggested by others, e.g,

  1. fetch the row with max values.
  2. get top entries.

But what I hope to understand here is where and why my current solution present here went wrong.


The movie rating table:

rID mID stars   ratingDate
201 101 2   2011-01-22
201 101 4   2011-01-27
203 103 2   2011-01-20
203 108 4   2011-01-12
203 108 2   2011-01-30
204 101 3   2011-01-09
205 103 3   2011-01-27
205 104 2   2011-01-22
...

Note: mID represents movie ID, rID represents reviewer ID, stars represent the scores ranked by reviewers.

My first thought is to get the average scores for each movie, using code as following:

Select mID, avg(stars) AS avg_stars
From Rating
Group by mID

The resulting summary table is

mID avg_stars
101 3.0
103 2.5
104 2.5
106 4.5
107 4.0
108 3.3

Then I want to select the max values of the scores column and the assciated mIDs

Select mID, max(avg_stars) AS Max_score
From (
Select mID, avg(stars) AS avg_stars
From Rating
Group by mID) T

I expected to get:

mID Max_score
106 4.5

But instead, I get:

mID Max_score
108 4.5
Community
  • 1
  • 1
enaJ
  • 1,565
  • 5
  • 16
  • 29
  • 1
    Did you look at this: http://stackoverflow.com/questions/11320039/mysql-get-top-average-entries – random_user_name Jan 03 '16 at 15:34
  • you should add another group by clause in your second query, i.e `Select mID, max(avg_stars) AS Max_score From ( Select mID, avg(stars) AS avg_stars From Rating Group by mID) Group By mID` – Nikos M. Jan 03 '16 at 15:37
  • 1
    Is this supposed to be answered with generic Standard SQL or RDBMS-specific syntax? – dnoeth Jan 03 '16 at 15:39
  • 2
    You seem to use MySQL as a DBMS, which allows non-Standard syntax: When you return `mID` without adding it to `GROUP BY` MySQL returns a single row with the maximum(average) and a *random* value for mID. – dnoeth Jan 03 '16 at 15:45
  • Hi dnoeth, your explaination sounds reasonable. But it doesn't seem to work adding Group By. My DBMS is SQLite. Is that possible your solution works for MySQL but not SQLite? – enaJ Jan 03 '16 at 15:48
  • Adding mID to the outer Select will not work, because it's not changing the result: "find the max of the average for each mID". You need to apply the logic I've shown in my answer... – dnoeth Jan 03 '16 at 15:59
  • Does @dnoeth's second solution work in SQLite? When you accept an answer, consider making this information available for future readers. – Dan Getz Jan 03 '16 at 17:23

4 Answers4

4

You seem to use MySQL as a DBMS, which allows non-Standard syntax:

When you return mID without adding it to GROUP BY MySQL returns a single row with the maximum(average) and a random value for mID.

This is a generic Standard SQL solution:

Select mID, avg(stars) AS avg_stars
From Rating
Group by mID
having avg(stars) =
 ( select max(avg_stars) -- find the max of the averages
   from 
     ( Select mID, avg(stars) AS avg_stars
       From Rating
       Group by mID
     ) T
 )

This might be very inefficient, that's why there are several proprietary syntax extensions. Most DBMSes (but not MySQL) support Standard SQL's Windowed Aggregate Functions:

select *
from
 ( 
   Select mID, avg(stars) AS avg_stars,
      max(avg(stars)) as max_avg
   From Rating
   Group by mID
 ) T
where avg_stars = max_avg

Edit:

As you added SQLite as DBMS, my 2nd query will not work (SQLite doesn't support Analytical Functions, too).

But as WITH is supported you can simplify #1 to a query similar to @user3580870's:

with cte as 
 ( Select mID, avg(stars) AS avg_stars
   From Rating
   Group by mID
 )
select * from cte
where avg_stars =
 ( select max(avg_stars) -- find the max of the averages
   from cte
 );

And this is still Standard SQL compliant...

dnoeth
  • 59,503
  • 4
  • 39
  • 56
  • The question specifically requires to use SQLite. See answer with `WITH` below. Does SQLite support the `max(avg( ))` syntax? – Dan Getz Jan 03 '16 at 16:56
  • hi @dnoeth, "When you return mID without adding it to GROUP BY MySQL returns a single row with the maximum(average) and a random value for mID." Does that happen for SQLite? Could you provide addtional examples like this? – enaJ Jan 03 '16 at 17:00
  • @user3580870: The `sqlite` tag added after I wrote my answer, it was just `sql` in the beginning. – dnoeth Jan 03 '16 at 18:20
  • @enaJ: I don't know about SQLite, it might be similar to MySQL in this case. – dnoeth Jan 03 '16 at 18:21
2

Instead of subquery try using order by and limit to first result:

SELECT mID, AVG(stars) AS avg_stars
FROM Rating
GROUP BY mID
ORDER BY avg_stars DESC LIMIT 1;
SMA
  • 36,381
  • 8
  • 49
  • 73
  • This works for the current table. Will it also work if there are multiple objects with highest value? – enaJ Jan 03 '16 at 15:54
  • @enaJ: No, it will not. – dnoeth Jan 03 '16 at 15:59
  • @enaJ, If the top 2 Movies have equal average rating and if you are trying to get top 2 movies, will you also consider another (3rd record) movie from your point of requirement? – Rajeev Jan 03 '16 at 16:02
  • @Rajeev, I would consider all the movies with the highest score, maybe 2, or 3, or 4 sharing the equally highest score – enaJ Jan 03 '16 at 16:51
2

Perhaps a WITH clause can do the trick. Cannot currently access a live DB to test, but the query should look like:

WITH sq AS 
  (SELECT mID, avg(stars) AS avg_stars FROM rating GROUP BY mID)
SELECT mId,avg_stars FROM sq t1 JOIN 
  (SELECT max(avg_stars) AS max_avg FROM sq) t2 
ON t1.avg_stars = t2.max_avg;

SQLite supports WITH clauses from version 3.8.3. This code was actually tested on the data in the question. It also makes the calculation explicitly more efficient than other solutions (though they may be optimized by some smart query analysis). Furthermore, it is clearest and the WITH clause can offer the path for further twists easily.

Dan Getz
  • 17,002
  • 2
  • 23
  • 41
  • This is what I would use for either: efficiency, educational or readability purposes. Unless your SQLite is an old version. – Dan Getz Jan 03 '16 at 17:05
1

This is just an extension to @SMA's answer.

If you want to consider duplicates records as single entity you can use the following query

SELECT mID,AVG(stars) 
FROM Rating
WHERE AVG(stars) IN (
            (SELECT AVG(stars)
             FROM Rating
             GROUP BY mID
             ORDER BY avg_stars DESC LIMIT 1)
            )
GROUP BY mID

The above query might not be an optimized one.

Community
  • 1
  • 1
Rajeev
  • 843
  • 2
  • 11
  • 22