SQL How to find the most popular, AND order by month

Question

I want to find the most popular value and order the result by month. I mean by this, the value that appeared the most by month.

I made a lot of research and some answers to find the most popular product. Usually, the solution is a group by and order by with a limit 1. Like this : Find most frequent value in SQL column

Here is my example table. Let's try to find the favorite colors of everyone.

Table picked_colors

| Color  | Name | Month |
|--------|------|-------|
| Yellow | Tom  | 04    |
| Red    | Tom  | 04    |
| Yellow | Tom  | 04    |
| Blue   | Tom  | 03    |
| Red    | Sam  | 04    |
| Pink   | Sam  | 04    |
| Pink   | Sam  | 04    |
| Yellow | Sam  | 03    |

So it seems that in April, Tom preferred the yellow color (he picked it two times vs one time the red one), and he preferred the blue in march. And Sam favorite's one is pink for April, and yellow for march.

So I would like a table that return where Name = Tom

Month / color
04 / Yellow
03 / blue

and where Name = Sam

Month / color
04 / Pink
03 / Yellow

I tried somethings not really working like, for example with this, I can find how many times Tom liked each colors every month. With this i could select the highest value as the favorite one but I don't know how :

SELECT MONTH(month), color, COUNT(color)
FROM picked_colors
WHERE name = 'Tom'
GROUP BY color, MONTH(month)
ORDER BY MONTH(month) desc;

it returns :

Month / color / Count
04 / yellow / 2
04 / red / 1
03 / blue / 1

Actually it is exactly what i need, except the second line i do not want. I thought of something with max(count()) but it is not allowed by mysql.

Anyone ? Thanks a lot.

Show us the "lot of stuff" you tried. We need to know how you're currently attempting to solve this, as you very well might have most of the solution already. — user2366842, Jul 27 '17 at 15:27
What kind of result do you want if there's a tie for the most preferred item? — Jeff Breadner, Jul 27 '17 at 22:43
@user2366842 I tried to make it more clear. Is it ok for you ? — Cyril, Jul 27 '17 at 22:48
@JeffBreadner Ho. I didn't thought about that... I guess it would select a random one of the tie. — Cyril, Jul 27 '17 at 22:48

Jeff Breadner · Accepted Answer · 2017-07-28T20:53:35.867

This code works.

Because you are likely to have duplicates, and you only want one answer for each user / month combination, we have to get more advanced with the solution.

This solution first aggregates by how many rows there are for each combination we care about (the aggregated_picked_colors subquery), then assigns a ranking of the number of colors in each combination we care about (the ranked_picked_colors subquery). The top level query then just pulls out the top ranked solution from ranked_picked_colors.

More detail on the method for incrementing these ranking column counters is available here: ROW_NUMBER() in MySQL

/*
create table picked_colors (
  color varchar(10),
  name varchar(10),
  month tinyint
);

insert into picked_colors (color, name, month) values ('Yellow', 'Tom', 4);
insert into picked_colors (color, name, month) values ('Red',    'Tom', 4);
insert into picked_colors (color, name, month) values ('Yellow', 'Tom', 4);
insert into picked_colors (color, name, month) values ('Blue',   'Tom', 3);
insert into picked_colors (color, name, month) values ('Red',    'Sam', 4);
insert into picked_colors (color, name, month) values ('Pink',   'Sam', 4);
insert into picked_colors (color, name, month) values ('Pink',   'Sam', 4);
insert into picked_colors (color, name, month) values ('Yellow', 'Sam', 3);
 */

select
  name, month, color
from
  (
    select
      @row_num := IF(@prev_value=concat_ws('', name, month),@row_num+1,1) as row_number,
      color, name, month, row_count,
      @prev_value := concat_ws('', name, month) as prev_value
    from
      (
        select
          name, month, color, count(*) as row_count
        from
          picked_colors
        group by
          name, month, color
      ) aggregated_picked_colors,
      (select @row_num := 1) x,
      (select @prev_value := '') y
    order by
      name, month, row_count desc
  ) ranked_picked_colors
where
  row_number = 1
order by
  name, month desc

Returns:

+------+-------+--------+
| name | month | color  |
+------+-------+--------+
| Sam  |     4 | Pink   |
| Sam  |     3 | Yellow |
| Tom  |     4 | Yellow |
| Tom  |     3 | Blue   |
+------+-------+--------+

Edit: Adding explanation / walkthrough

The main objective here is that we want to take each set of data that we're aggregating (in this case, each combination of name / month) and sort the colors by how many rows they have. We then want to insert a new column into these results with an explicit ranking of which color has the most rows, which color has the next most rows, etc.

When we start, we're not really ready to rank these things, we have multiple rows that say "yellow", but instead we want the data to show "yellow has 5 rows" or something along those lines. So, the first query we write does this aggregation:

select
  name, month, color, count(*) as row_count
from
  picked_colors
group by
  name, month, color;

This returns:

+------+-------+--------+-----------+
| name | month | color  | row_count |
+------+-------+--------+-----------+
| Sam  |     3 | Yellow |         1 |
| Sam  |     4 | Pink   |         2 |
| Sam  |     4 | Red    |         1 |
| Tom  |     3 | Blue   |         1 |
| Tom  |     4 | Red    |         1 |
| Tom  |     4 | Yellow |         2 |
+------+-------+--------+-----------+

This shows, for each Name and Month, how many rows each Color has.

Next, we want to show, for each Name and Month, which Color has the most rows, which Color has the second most rows, etc. This logic is the most convoluted.

The idea here is that in the table definitions, we insert these lines:

(select @row_num := 1) x,
(select @prev_value := '') y

These commands basically initialize these variables. The X and Y names are only provided because we have to give these "subqueries" a name, they're not actually used anywhere.

Inside of the query, we use these variables. Effectively, they check to see if the Color / Name combination has changed from the previous row. If they do change, then we set @row_num to 1, if they don't change, then we increment it. We have to be careful to sort this query by the same criteria that we're comparing adjacent rows; changing the sort order here will break the logic.

select
  @row_num := IF(@prev_value=concat_ws('', name, month),@row_num+1,1) as row_number,
  color, name, month, row_count,
  @prev_value := concat_ws('', name, month) as prev_value
from
  (
    select
      name, month, color, count(*) as row_count
    from
      picked_colors
    group by
      name, month, color
  ) aggregated_picked_colors,
  (select @row_num := 1) x,
  (select @prev_value := '') y
order by
  name, month, row_count desc;

This returns:

+------------+--------+------+-------+-----------+------------+
| row_number | color  | name | month | row_count | prev_value |
+------------+--------+------+-------+-----------+------------+
|          1 | Yellow | Sam  |     3 |         1 | Sam3       |
|          1 | Pink   | Sam  |     4 |         2 | Sam4       |
|          2 | Red    | Sam  |     4 |         1 | Sam4       |
|          1 | Blue   | Tom  |     3 |         1 | Tom3       |
|          1 | Yellow | Tom  |     4 |         2 | Tom4       |
|          2 | Red    | Tom  |     4 |         1 | Tom4       |
+------------+--------+------+-------+-----------+------------+

We can ignore the 'prev_value' column here, we don't actually want these results, we're just setting this variable to be used when we inspect the next row. The important thing is to see when we have the same Name and Month, the color with the highest row_count has row_number = 1, the color with the next highest row_count has row_number = 2, etc.

The last step is to query out just the fields we want, and just the rows where row_number = 1. These are the rows that correspond to the highest frequency color for each name/month combination.

select
  name, month, color
from
  (
    select
      @row_num := IF(@prev_value=concat_ws('', name, month),@row_num+1,1) as row_number,
      color, name, month, row_count,
      @prev_value := concat_ws('', name, month) as prev_value
    from
      (
        select
          name, month, color, count(*) as row_count
        from
          picked_colors
        group by
          name, month, color
      ) aggregated_picked_colors,
      (select @row_num := 1) x,
      (select @prev_value := '') y
    order by
      name, month, row_count desc
  ) ranked_picked_colors
where
  row_number = 1
order by
  name, month desc

This returns:

+------+-------+--------+
| name | month | color  |
+------+-------+--------+
| Sam  |     4 | Pink   |
| Sam  |     3 | Yellow |
| Tom  |     4 | Yellow |
| Tom  |     3 | Blue   |
+------+-------+--------+

Thank you so much. Could you explain the code ? It is quite advanced and I cannot understand everything in it. — Cyril, Jul 28 '17 at 07:53
I'll write a detailed walkthrough in a few hours, maybe towards the end of the day. Sorry for the delay, I need to get through my work day first. — Jeff Breadner, Jul 28 '17 at 14:40
I red your detailed walkthrough. It is really clear now. Thank you so much for your time and explainations. — Cyril, Jul 28 '17 at 22:20

SQL How to find the most popular, AND order by month

1 Answers1