How to count only the first 5 items and then group the remaining under 'other' bin?

Question

I have a table like this;

+----+---------+-------------+
| id | user_id | screenWidth |
+----+---------+-------------+
|  1 |       1 |        1366 |
|  2 |       1 |        1366 |
|  3 |       1 |        1366 |
|  4 |       1 |        1366 |
|  5 |       2 |        1920 |
|  6 |       2 |        1920 |
|  7 |       3 |        1920 |
|  8 |       4 |        1280 |
|  9 |       5 |        1280 |
| 10 |       6 |        1280 |
| 11 |       7 |        1890 |
| ...|   ...   |     ...     |
| ...|   ...   |     ...     |
| ...|   ...   |     ...     |
| 100|       6 |        1910 |
+----+---------+-------------+

Where there are lots of screenWidths, but 90% of them are equal to one of 5 values.

Using a query like:

SELECT      screenwidth
        ,   COUNT(DISTINCT user_id) AS screenwidthcount
FROM        screenwidth
GROUP BY    screenwidth
ORDER BY    screenwidthcount;

(Thanks from How do I count only the first occurrence of a value?)

I get a nice count for the number of times a screenWidth has occurred, counting only once per user.

Is there a way to count the most popular screenWidths, then collect all the others in a category called "other" - that is to say, instead of the query above returning loads of rows, it returns 6, the first 5 being the first 5 it returns currently, the 6th being called other with the sum of the rest of the values?

Have you considered using MySql's `RANK` function? – Marc Apr 29 '12 at 15:43 — Marc, Apr 29 '12 at 15:43
Correction: maybe there's not a `RANK` function in MySql. – Marc Apr 29 '12 at 15:48 — Marc, Apr 29 '12 at 15:48

score 3 · Accepted Answer · edited May 23 '17 at 12:29

Here is one way to do it. Following script was created based on the answer to this question Rank function in MySQL

The query assigns a ranking to all the rows for which distinct count has been computer. I have assigned a value of 2 in the CASE expressions. This denotes that the script will display the top 2 screen widths and the remaining will be clubbed into Other. You need to change the value according to your requirement. I have hard coded the value 99999 to group all the other rows.

There might be a better way to do this but this is one of the ways I could make it to work.

Click here to view the demo in SQL Fiddle.

Script:

CREATE TABLE screenwidth 
(
    id INT NOT NULL
  , user_id INT NOT NULL
  , screenwidth INT NOT NULL
);

INSERT INTO screenwidth (id, user_id, screenwidth) VALUES
  (1, 1, 1366),
  (2, 2, 1366),
  (3, 2, 1366),
  (4, 2, 1366),
  (5, 3, 1366),
  (6, 1, 1920),
  (7, 2, 1920),
  (8, 1, 1440),
  (9, 2, 1440),
  (10, 3, 1440),
  (11, 4, 1440),
  (12, 1, 1280),
  (13, 1, 1024),
  (14, 2, 1024),
  (15, 3, 1024),
  (16, 3, 1024),
  (17, 3, 1024),
  (18, 1, 1366);

SELECT screenwidth
    , SUM(screenwidthcount) AS screenwidth_count
FROM
(
    SELECT      CASE    
                    WHEN @curRank < 2 THEN screenwidth 
                    ELSE 'Other' 
                END AS screenwidth
            ,   screenwidthcount
            ,   @curRank := 
                (   CASE 
                        WHEN @curRank < 2 THEN @curRank + 1 
                        ELSE 99999
                    END
                ) AS rank
    FROM
    (
        SELECT      screenwidth
                ,   COUNT(DISTINCT user_id) AS screenwidthcount
        FROM        screenwidth
        GROUP BY    screenwidth
        ORDER BY    screenwidthcount DESC
    ) T1
                ,   (SELECT @curRank := 0) r
) T2
GROUP BY    screenwidth
ORDER BY    rank;

Output:

SCREENWIDTH SCREENWIDTH_COUNT
----------- -----------------
1440               4
1024               3
Other              6

Michael Buen · Answer 2 · 2012-05-01T09:20:09.083

Try this:

select

  case when rank <= 5 then rank else 'Other' end as screenwidth, 

  sum(screenwidthcount) as screenwidthcount,

  least(rank,6) as LimitRank

from
(
  SELECT
  *, (@r := @r + 1) as rank
  FROM
  (
    SELECT      screenwidth
            ,   COUNT(DISTINCT user_id) AS screenwidthcount

    FROM        tbl

    GROUP BY    screenwidth
    ORDER BY    screenwidthcount desc, screenwidth desc
  ) AS X
  cross join (select @r := 0 as init ) rx
) as y

group by LimitRank

Data sample:

CREATE TABLE tbl
    (id int, user_id int, screenWidth int);

INSERT INTO tbl
    (id, user_id, screenWidth)
VALUES
    (1, 1, 1366),
    (2, 1, 1366),
    (3, 1, 1366),
    (4, 1, 1366),
    (5, 2, 1920),
    (6, 2, 1920),
    (7, 3, 1920),
    (8, 4, 1280),
    (9, 5, 1280),
    (10, 6, 1280),
    (11, 7, 1890),
    (12, 9, 1890),
    (13, 9, 1890),
    (13, 9, 1024),
    (13, 9, 800),
    (100, 6, 1910);

Output:

SCREENWIDTH SCREENWIDTHCOUNT    LIMITRANK
1280        3                   1
1920        2                   2
1890        2                   3
1910        1                   4
1366        1                   5
Other       2                   6

Live test: http://www.sqlfiddle.com/#!2/c0e94/33

Here's the uncapped results: http://www.sqlfiddle.com/#!2/c0e94/31

SCREENWIDTH SCREENWIDTHCOUNT
1280        3
1920        2
1890        2
1910        1
1366        1
1024        1
800         1

Charles Bretana · Answer 3 · 2012-04-29T15:56:54.493

Yes, with the ubiuquitous case statement: I don't have MySQL, but this, or something like this, should work...

A. Inner Select generates resultset of screnwidth, and the count of distinct users that have that screenwidth... (this effectively counts each screnwidth only once per user). Result set is limited to only those screenwidths used by five or more users.

B. Then outer query joins the complete table to that resultset, grouping on an expression and summing "Cnt" that represents the number of users using each screenwidth.

   Select case When Z.Cnt < 5 Then screnwidth, else 0 end
       Sum(Z.Cnt) screenwidthcount, 
   From screenwidth A
      Left Join (Select screenwidth, Count(Distinct User_ID) Cnt
                 From screenwidth
                 Group By screenwidth
                 Having count(*) > 4) Z
        On Z.screeenwidth = A.screeenwidth         
   Group By case When Z.Cnt < 5 Then screnwidth, else 0 end

C. If MySql has a function like SQL Servers Str() function, you can use that to convert the case expression to a string, then insteaed of the 0 after the else, you can use 'other'

   Select case When Z.Cnt < 5 Then Str(screnwidth, 6,0) else 'other' end
       Sum(Z.Cnt) screenwidthcount, 
   From screenwidth A
      Left Join (Select screenwidth, Count(Distinct User_ID) Cnt
                 From screenwidth
                 Group By screenwidth
                 Having count(*) > 4) Z
        On Z.screeenwidth = A.screeenwidth         
   Group By case When Z.Cnt < 5 Then Str(screnwidth, 6,0) else 'other'  end

The OP isn't asking for screen widths with 5 or more users, but rather the 5 most popular screen widths. — MatBailie, Apr 29 '12 at 17:04
except that he says "counting only once per user", which makes it, effectively, counting users. — Charles Bretana, Apr 30 '12 at 03:04
Sure, use `COUNT(DISTINCT user_id)` to rank the screen widths. But then you need a `LIMIT 5` type answer; If every screen width has 5+ users you're return every screen width, where as the OP wants the 5 highest ranked only. — MatBailie, Apr 30 '12 at 08:49
Re: "I don't have MySQL" This could not be an excuse anymore. You can use http://sqlfiddle.com You can even use its Text to DDL to quickly prototype the DDL based on tab-delimited,box-delimited(Postgresql and MySql output) and CSV-delimited data. See this example, the DDL and its data samples came from copy-pasted OP's sample data: http://www.sqlfiddle.com/#!2/7b1d2 — Michael Buen, May 01 '12 at 01:22

How to count only the first 5 items and then group the remaining under 'other' bin?

3 Answers3