This code works.
Because you are likely to have duplicates, and you only want one answer for each user / month combination, we have to get more advanced with the solution.
This solution first aggregates by how many rows there are for each combination we care about (the aggregated_picked_colors subquery), then assigns a ranking of the number of colors in each combination we care about (the ranked_picked_colors subquery). The top level query then just pulls out the top ranked solution from ranked_picked_colors.
More detail on the method for incrementing these ranking column counters is available here: ROW_NUMBER() in MySQL
/*
create table picked_colors (
color varchar(10),
name varchar(10),
month tinyint
);
insert into picked_colors (color, name, month) values ('Yellow', 'Tom', 4);
insert into picked_colors (color, name, month) values ('Red', 'Tom', 4);
insert into picked_colors (color, name, month) values ('Yellow', 'Tom', 4);
insert into picked_colors (color, name, month) values ('Blue', 'Tom', 3);
insert into picked_colors (color, name, month) values ('Red', 'Sam', 4);
insert into picked_colors (color, name, month) values ('Pink', 'Sam', 4);
insert into picked_colors (color, name, month) values ('Pink', 'Sam', 4);
insert into picked_colors (color, name, month) values ('Yellow', 'Sam', 3);
*/
select
name, month, color
from
(
select
@row_num := IF(@prev_value=concat_ws('', name, month),@row_num+1,1) as row_number,
color, name, month, row_count,
@prev_value := concat_ws('', name, month) as prev_value
from
(
select
name, month, color, count(*) as row_count
from
picked_colors
group by
name, month, color
) aggregated_picked_colors,
(select @row_num := 1) x,
(select @prev_value := '') y
order by
name, month, row_count desc
) ranked_picked_colors
where
row_number = 1
order by
name, month desc
Returns:
+------+-------+--------+
| name | month | color |
+------+-------+--------+
| Sam | 4 | Pink |
| Sam | 3 | Yellow |
| Tom | 4 | Yellow |
| Tom | 3 | Blue |
+------+-------+--------+
Edit: Adding explanation / walkthrough
The main objective here is that we want to take each set of data that we're aggregating (in this case, each combination of name / month) and sort the colors by how many rows they have. We then want to insert a new column into these results with an explicit ranking of which color has the most rows, which color has the next most rows, etc.
When we start, we're not really ready to rank these things, we have multiple rows that say "yellow", but instead we want the data to show "yellow has 5 rows" or something along those lines. So, the first query we write does this aggregation:
select
name, month, color, count(*) as row_count
from
picked_colors
group by
name, month, color;
This returns:
+------+-------+--------+-----------+
| name | month | color | row_count |
+------+-------+--------+-----------+
| Sam | 3 | Yellow | 1 |
| Sam | 4 | Pink | 2 |
| Sam | 4 | Red | 1 |
| Tom | 3 | Blue | 1 |
| Tom | 4 | Red | 1 |
| Tom | 4 | Yellow | 2 |
+------+-------+--------+-----------+
This shows, for each Name and Month, how many rows each Color has.
Next, we want to show, for each Name and Month, which Color has the most rows, which Color has the second most rows, etc. This logic is the most convoluted.
The idea here is that in the table definitions, we insert these lines:
(select @row_num := 1) x,
(select @prev_value := '') y
These commands basically initialize these variables. The X and Y names are only provided because we have to give these "subqueries" a name, they're not actually used anywhere.
Inside of the query, we use these variables. Effectively, they check to see if the Color / Name combination has changed from the previous row. If they do change, then we set @row_num to 1, if they don't change, then we increment it. We have to be careful to sort this query by the same criteria that we're comparing adjacent rows; changing the sort order here will break the logic.
select
@row_num := IF(@prev_value=concat_ws('', name, month),@row_num+1,1) as row_number,
color, name, month, row_count,
@prev_value := concat_ws('', name, month) as prev_value
from
(
select
name, month, color, count(*) as row_count
from
picked_colors
group by
name, month, color
) aggregated_picked_colors,
(select @row_num := 1) x,
(select @prev_value := '') y
order by
name, month, row_count desc;
This returns:
+------------+--------+------+-------+-----------+------------+
| row_number | color | name | month | row_count | prev_value |
+------------+--------+------+-------+-----------+------------+
| 1 | Yellow | Sam | 3 | 1 | Sam3 |
| 1 | Pink | Sam | 4 | 2 | Sam4 |
| 2 | Red | Sam | 4 | 1 | Sam4 |
| 1 | Blue | Tom | 3 | 1 | Tom3 |
| 1 | Yellow | Tom | 4 | 2 | Tom4 |
| 2 | Red | Tom | 4 | 1 | Tom4 |
+------------+--------+------+-------+-----------+------------+
We can ignore the 'prev_value' column here, we don't actually want these results, we're just setting this variable to be used when we inspect the next row. The important thing is to see when we have the same Name and Month, the color with the highest row_count has row_number = 1, the color with the next highest row_count has row_number = 2, etc.
The last step is to query out just the fields we want, and just the rows where row_number = 1. These are the rows that correspond to the highest frequency color for each name/month combination.
select
name, month, color
from
(
select
@row_num := IF(@prev_value=concat_ws('', name, month),@row_num+1,1) as row_number,
color, name, month, row_count,
@prev_value := concat_ws('', name, month) as prev_value
from
(
select
name, month, color, count(*) as row_count
from
picked_colors
group by
name, month, color
) aggregated_picked_colors,
(select @row_num := 1) x,
(select @prev_value := '') y
order by
name, month, row_count desc
) ranked_picked_colors
where
row_number = 1
order by
name, month desc
This returns:
+------+-------+--------+
| name | month | color |
+------+-------+--------+
| Sam | 4 | Pink |
| Sam | 3 | Yellow |
| Tom | 4 | Yellow |
| Tom | 3 | Blue |
+------+-------+--------+