54

I have a table with 3 columns which looks like this:

File    User     Rating (1-5)
------------------------------
00001    1        3
00002    1        4
00003    2        2
00004    3        5
00005    4        3
00005    3        2
00006    2        3
Etc.

I want to generate a query that outputs the following (for each user and rating, display the number of files as well as percentage of files):

User    Rating   Count   Percentage
-----------------------------------
1       1         3      .18
1       2         6      .35
1       3         8      .47
2       5         12     .75
2       3         4      .25

With Postgresql, I know how to create a query that includes the first 3 columns using the following query, but I can't figure out how to calculate percentage within the GROUP BY:

SELECT
    User,
    Rating,
    Count(*)
FROM
    Results
GROUP BY
    User, Rating
ORDER BY
    User, Rating

Here I want the percentage calculation to apply to each user/rating group.

ekad
  • 14,436
  • 26
  • 44
  • 46
DeadMonkey
  • 775
  • 2
  • 6
  • 6

5 Answers5

51
WITH t1 AS 
 (SELECT User, Rating, Count(*) AS n 
  FROM your_table
  GROUP BY User, Rating)
SELECT User, Rating, n, 
       (0.0+n)/(COUNT(*) OVER (PARTITION BY User)) -- no integer divide!
FROM t1;

Or

SELECT User, Rating, Count(*) OVER w_user_rating AS n, 
        (0.0+Count(*) OVER w_user_rating)/(Count(*) OVER (PARTITION BY User)) AS pct
FROM your_table
WINDOW w_user_rating AS (PARTITION BY User, Rating);

I would see if one of these or the other yields a better query plan with the appropriate tool for your RDBMS.

James Boutcher
  • 2,593
  • 1
  • 25
  • 37
Andrew Lazarus
  • 18,205
  • 3
  • 35
  • 53
  • 2
    Thank you Andrew! I slightly modified version of your second query worked: `select user, rating, cnt, cnt::float * 100/(sum(cnt) over (partition by user)) from (select user, rating, count(*) as cnt from tbl group by user, rating) a order by user,rating` – DeadMonkey Jun 01 '11 at 21:22
  • 5
    Do you need to use SUM(COUNT(*)) in both examples? Something like `(SUM(COUNT(*)) OVER (PARTITION BY User))` in the first example? With SUM I get the expected values, otherwise I'm dividing by the number of Ratings instead of the sum of their counts. – Tyler DeWitt Oct 28 '13 at 16:11
  • @TylerDeWitt, I think so, see DeadMonkey's comment above. – Andrew Lazarus Oct 28 '13 at 20:22
12

Alternatively, you can do the old-school way — arguably easier to grok:

select usr.User                   as User   ,
       usr.Rating                 as Rating ,
       usr.N                      as N      ,
       (100.0 * usr.N) / total.N as Pct
from ( select User, Rating , count(*) as N
       from Results
       group by User , Rating
     ) usr
join ( select User , count(*) as N
       from Results
       group by User
     ) total on total.User = usr.User
order by usr.User, usr.Rating

Cheers!

user1717828
  • 7,122
  • 8
  • 34
  • 59
Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
  • Thank you! This works too without using window functions. I had to change "item" to "usr" in the query above and remove the first count(*). – DeadMonkey Jun 01 '11 at 21:42
5

The best way to do this would be with window functions.

mikerobi
  • 20,527
  • 5
  • 46
  • 42
3

In TSQL this should work

SELECT
    User,
    Rating,
    Count(*), SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating) AS Total,
Count(*)/(SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating)) AS Percentage
FROM
    Results
GROUP BY
    User, Rating
ORDER BY
    User, Rating
James Holland
  • 1,102
  • 10
  • 17
0
WITH data AS 
 (SELECT User, Rating, Count(*) AS Count 
  FROM Results
  GROUP BY User, Rating)
SELECT User, Rating, Count, 
       (0.0+n)/(SUM(Count) OVER (PARTITION BY User))
FROM data;
Jason Aller
  • 3,541
  • 28
  • 38
  • 38
user3060544
  • 269
  • 2
  • 5