0

I have a table, called top_trends which has the following schema:

id int(11) AI PK 
criteria_id int(11) 
value varchar(255) 
created_at timestamp 
updated_at timestamp

Then I have another table, called search_criterias which has the following schema:

id int(11) AI PK 
title varchar(255) 
created_at timestamp 
updated_at timestamp

Here is a query which provides the value based on maximum number of records by a value. So, if theres 2 records in top_trends with both having criteria_id as 1 and both have values of 3 in the top_trends.value column, and then a separate SINGLE record with the same criteria_id 1 but a value of 2, the query will produce a result of the selected criteria (being 1) having a value of 3 since the value 3 occurred more times than any other rows of values with criteria_id 1. So, in simple terms, the query chose the value 3 for criteria id 1 because that occurred the most amount of times based on the records in the top_trends having criteria_id 1

select
  x.value as `values`
  , sc.id as id
  , sc.title
  , sc.created_at
  , sc.updated_at
  , x.criteria_id as search_category_id 
from
  ( 
    select
      criteria_id
      , `value` 
    from
      top_trends 
    group by
      `criteria_id` 
    order by
      `value`
  ) x 
  left join search_criterias sc 
    on sc.id = x.criteria_id 
group by
  criteria_id

My issue is that unfortunately right now we dont have data for all possible search_criterias so some of the records in the search_criteria table are not being aggregated in my query.

For example, we have a search_criteria record of city, with an id of 5, but no records in the top_trends table with having a criteria_id of 5... so the query above is not including that search_criteria.

What I'd like to do is include those records in the search_criteria table that are not in the top_trends table but have the values attribute as null

Antony Jack
  • 480
  • 2
  • 16
somejkuser
  • 8,856
  • 20
  • 64
  • 130
  • Your subquery is not the correct way to get the most common value for each criteria ID. – Barmar Mar 25 '20 at 02:08
  • ill retest tomorrow but it should be. it selects values ordered than slices the top? @Barmar – somejkuser Mar 25 '20 at 02:09
  • You should be using `RIGHT JOIN` if the missing rows are in `top_trends`. `LEFT JOIN` would be appropriate if the missing rows are in `search_criteria`. – Barmar Mar 25 '20 at 02:09
  • There's no slicing. It picks a random value in each group, then orders by that. – Barmar Mar 25 '20 at 02:10
  • @Barmar then ill add a DESC clause to the order by? – somejkuser Mar 25 '20 at 02:10
  • It doesn't matter how you order, it's still picking a random value for each group. – Barmar Mar 25 '20 at 02:10
  • @Barmar im going to sleep ill review this tomorrow if you have any good answers for me to look at in the morning thatd be great. You understand what I'm trying to achieve? – somejkuser Mar 25 '20 at 02:11
  • See https://stackoverflow.com/questions/12446368/sql-returning-the-most-common-value-for-each-person for how to get the most common value in a group. – Barmar Mar 25 '20 at 02:11
  • This problem is unrelated to the problem in your question. The answer to that is to switch the order of the `LEFT JOIN` or change it to `RIGHT JOIN`. – Barmar Mar 25 '20 at 02:12

1 Answers1

0

LEFT JOIN is used when there are rows in the left table that have no match in the right table. Since the missing rows in your case are in top_trends, that should be the right table of the LEFT JOIN. So switch the order of the join.

select
  x.value as `values`
  , sc.id as id
  , sc.title
  , sc.created_at
  , sc.updated_at
  , x.criteria_id as search_category_id 
from search_criterias sc 
left join
  ( 
    select
      criteria_id
      , `value` 
    from
      top_trends 
    group by
      `criteria_id` 
    order by
      `value`
  ) x 
    on sc.id = x.criteria_id 
group bysc.id
Barmar
  • 741,623
  • 53
  • 500
  • 612