6

Lets say I have the following table

+-----------------------+----------------+
| PERSON                | ATE            |
+-----------------------+----------------+
| Joe                   | Chicken        |
| Joe                   | Chicken        |
| Joe                   | Chicken        |
| Joe                   | Tuna           |
| Joe                   | Tuna           |
| Joe                   | Chicken        |
| Joe                   | Tuna           |
| Joe                   | Chicken        |
| Joe                   | Tuna           |
| Joe                   | Tuna           |
| Joe                   | Tuna           |
| Joe                   | Chicken        |
| Joe                   | Tuna           |
| Joe                   | Tuna           |
| Joe                   | Chicken        |
| Joe                   | Tuna           |
| Joe                   | Tuna           |
| Joe                   | Tuna           |
| Rob                   | Chicken        |
| Rob                   | Tuna           |
| Rob                   | Chicken        |
| Rob                   | Chicken        |
| Rob                   | Chicken        |
| Rob                   | Chicken        |
| Jonathan              | Tuna           |
| Jonathan              | Tuna           |
| Jonathan              | Chicken        |
| Jonathan              | Tuna           |
| Jonathan              | Chicken        |
+-----------------------+----------------+

I'd like to find the most frequently eaten food for each person. So, the desired result would be:

+-----------------------+----------------+
| PERSON                | ATE            |
+-----------------------+----------------+
| Joe                   | Tuna           |
| Rob                   | Chicken        |
| Jonathan              | Tuna           |
+-----------------------+----------------+

Naturally, I would GROUP BY (Person). Afterwards, I have to find the most frequently eaten item in each group. If they were numbers it'd be easy since I can just use the MAX function, but I have no idea how to pick out the most popular string among a bunch.

How would you accomplish what I'm trying to do in SQL?

Naldhelaan
  • 390
  • 4
  • 13
  • 1
    Count and then find the maximum of counts. – PM 77-1 Oct 26 '17 at 21:26
  • Max of count would return a number (count). Do I have to then find the food with a count that matches that max_count? Seems kind of complicated for such a simple task – Naldhelaan Oct 26 '17 at 21:32
  • @Naldhelaan it's simple with `row_number` which mysql only supports as of version 8 - https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_row-number – FuzzyTree Oct 26 '17 at 21:35
  • So this isn’t really a table – Strawberry Oct 26 '17 at 21:36
  • Actually, count would count the number of items eaten, not the number of times each item has been eaten. I would have to GROUP BY(ATE) as well for your suggestion to work, hence my initial confusion – Naldhelaan Oct 26 '17 at 21:40
  • @GordonLinoff Did you misread the question? There's nothing about most recent in it. – Barmar Oct 26 '17 at 21:48
  • Write a subquery that gets the count grouped by person and food, then use one of the solutions [here](https://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column?rq=1) to select the maximum row per person. – Barmar Oct 26 '17 at 21:50

2 Answers2

8

You can use a correlated subquery:

select distinct t1.PERSON, (
    select ATE
    from myTable t2
    where t2.PERSON = t1.PERSON
    group by ATE
    order by count(*) desc
    limit 1
) as ATE
from myTable t1

If you have ties, this query will pick one of the most eaten items "randomly".

With MySQL 8 or MariaDB 10.2 (both not stable yet) you will be able to use CTE (Common Table Expression)

with t1 as (
    select PERSON, ATE, count(*) as cnt
    from myTable
    group by PERSON, ATE
), t2 as (
    select PERSON, max(cnt) as cnt
    from t1
    group by PERSON
)
    select *
    from t1
    natural join t2

On ties this query may return multiple rows per group (PERSON).

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53
1

Typically you would have to aggregate your data several times to get the count per person and food, then per person, then count again in order to compare ...

But there's a trick: Glue the count with the food:

Joe Chicken      00000000000007Chicken
Joe Tuna         00000000000011Tuna
Jonathan Chicken 00000000000002Chicken 
Jonathan Tuna    00000000000003Tuna 
Rob Chicken      00000000000005Chicken 
Rob Tuna         00000000000001Tuna 

Then get the maximum entry per person (e.g. 00000000000011Tuna for Joe) and strip the number:

select person, substr(max(combined), 15) as ate_most
from
(
  select person, concat(lpad(count(*), 14, '0'), ate) as combined
  from mytable
  group by person, ate
) trick
group by person
order by person;

http://rextester.com/GVO98572

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • Thank you for your answer. Unfortunately this doesn't scale very well to production systems, so I'm hesitant to use it – Naldhelaan Oct 27 '17 at 00:42
  • @Naldhelaan: You are right not to use it. I was thinking too complicatedly. (This may be because I am used to Oracle, where the whole query is much, much simpler: `select person, stats_mode(ate) from mytable group by person`. So I had to think of a more complex pattern than usual and overdid this :-) Paul's answer is much better than mine. – Thorsten Kettner Oct 27 '17 at 07:19