Here's a hacky approach to this which utilizes the max
aggregate function seeing as there is no mode aggregate function in MySQL (or windowing functions etc.) that would allow this:
SELECT
tag,
convert(substring(max(concat(lpad(c, 20, '0'), category)), 21), int)
AS most_frequent_category
FROM (
SELECT tag, category, count(*) AS c
FROM tags INNER JOIN stuff using (id)
GROUP BY tag, category
) as grouped_cats
GROUP BY tag;
Basically it utilizes the fact that we can find the lexical max of the counts of each individual category.
This is easier to see with named categories:
create temporary table tags (id int auto_increment primary key, tag character varying(20));
create temporary table stuff (id int, category character varying(20));
insert into tags (tag) values ('automotive'), ('ba'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('banana tree'), ('banana tree'), ('banana tree'), ('banana tree'), ('bath');
insert into stuff (id, category) values (1, 'cat-8'), (2, 'cat-8'), (3, 'cat-8'), (4, 'cat-8'), (5, 'cat-8'), (6, 'cat-8'), (7, 'cat-8'), (8, 'cat-10'), (9, 'cat-8'), (10, 'cat-9'), (11, 'cat-8'), (12, 'cat-10'), (13, 'cat-8'), (14, 'cat-9'), (15, 'cat-8'), (16, 'cat-8'), (17, 'cat-8'), (18, 'cat-8'), (19, 'cat-8'), (20, 'cat-9');
In which case we shouldn't be doing integer conversion on the most_frequent_category
column:
SELECT
tag,
substring(max(concat(lpad(c, 20, '0'), category)), 21) AS most_frequent_category
FROM (
SELECT tag, category, count(*) AS c
FROM tags INNER JOIN stuff using (id)
GROUP BY tag, category
) as grouped_cats
GROUP BY tag;
+-------------+------------------------+
| tag | most_frequent_category |
+-------------+------------------------+
| automotive | cat-8 |
| ba | cat-8 |
| bamboo | cat-8 |
| banana tree | cat-8 |
| bath | cat-9 |
+-------------+------------------------+
And to delve a little bit more into what is going on, here's what the grouped_cats
inner select looks like (I've added order by tag, c desc
):
+-------------+----------+---+
| tag | category | c |
+-------------+----------+---+
| automotive | cat-8 | 1 |
| ba | cat-8 | 1 |
| bamboo | cat-8 | 9 |
| bamboo | cat-10 | 2 |
| bamboo | cat-9 | 2 |
| banana tree | cat-8 | 4 |
| bath | cat-9 | 1 |
+-------------+----------+---+
And we can see how the max of the count(*)
column drags along it's associated category if we omit the substring
bit:
SELECT
tag,
max(concat(lpad(c, 20, '0'), category)) AS xmost_frequent_category
FROM (
SELECT tag, category, count(*) AS c
FROM tags INNER JOIN stuff using (id)
GROUP BY tag, category
) as grouped_cats
GROUP BY tag;
+-------------+---------------------------+
| tag | xmost_frequent_category |
+-------------+---------------------------+
| automotive | 00000000000000000001cat-8 |
| ba | 00000000000000000001cat-8 |
| bamboo | 00000000000000000009cat-8 |
| banana tree | 00000000000000000004cat-8 |
| bath | 00000000000000000001cat-9 |
+-------------+---------------------------+