1

I have two tables being joined. They have a one-to-many relationship. I would like to make a query that pulls the most frequent of the "many" table for each of the parent id's.

I have a query like this that is in the beginning phases of what I am trying to do:

SELECT  p.profile, up.value, COUNT(*) AS theCount FROM `profiles` AS p
JOIN user_profile AS up ON p.id = up.profile_id
GROUP BY `profile`, `value`
ORDER BY p.profile ASC, theCount DESC;

Now the problem with this query is that it will show all values from the many table, and just group them with most to least. For example:

If table profile has values of val 1,val 2 and table user_profiles has values of Apple, Apple, Orange with a parent_id of val 1, and Pear for val 2 in the first table, the query above will group them like this:

val 1, Apple, 2

val 1, Orange, 1

val 2, Pear, 1

Now what I WANT is this:

val 1, Apple, 2

val 2, Pear, 1

I only want to show the highest value for the parent. Or if the case there is only one value (e.g. val 2) then show that.

Now, I can easily do this in a subquery (Do a Limit 1 for each parent value; order by theCount DESC). However, is there (1) a way to do this without a subquery? or (2) a MySQL function or some other way to do this? I am trying hard to keep these queries very quick and high performance.

If you can show me that a subquery will not hinder performance here, I will select that as an answer as well.

TheLettuceMaster
  • 15,594
  • 48
  • 153
  • 259
  • 1
    Using the ["greatest N per group" pattern here](https://stackoverflow.com/questions/1442527/how-to-select-the-newest-four-items-per-category) and joining against a subquery that produces the counts, this should be achievable and performant. A joined subquery will not suffer the same severe performance problems a subselect would. – Michael Berkowski Jun 02 '15 at 23:18
  • If you could setup this sample set over at http://sqlfiddle.com with your two tables, that would help. – Michael Berkowski Jun 02 '15 at 23:20
  • @MichaelBerkowski I believe you have the exact idea with what I need based on the link you provided. If you think a subquery would be ok, Then I should try that and compare time on the queries. – TheLettuceMaster Jun 02 '15 at 23:30
  • Although it is a little bit conceptually hacky, Gordon's method using group_concat() and string manips is probably faster and is certainly easier to implement than the user variables method. Try it out. – Michael Berkowski Jun 02 '15 at 23:34
  • @MichaelBerkowski I tried it, it is very fast actually. – TheLettuceMaster Jun 03 '15 at 00:23

1 Answers1

2

With this type of problem, I think the group_concat()/substring_index() trick is the easiest method in MySQL:

SELECT profile, substring_index(group_concat(up.value order by theCount desc), ',', 1) as MostCommonValue,
       theCount
FROM (SELECT  p.profile, up.value, COUNT(*) AS theCount
      FROM `profiles` p JOIN
            user_profile up 
            ON p.id = up.profile_id
      GROUP BY `profile`, `value`
     ) pv
GROUP BY profile
ORDER BY p.profile ASC, theCount DESC;

This method has short-comings. For instance, if value can contain a comma, you need a different separator. More importantly, there is a maximum length to the group_concat() intermediate value (but you set the parameter to have a larger value). But, it works quite well in many cases.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786