0

I have three tables described below:

Area (Id, Description)

City(Id, Name)

Problem(Id, City, Area, Definition):
 City references City (Id), Area references Area (Id)

I want to find the most frequent value of Area(Description) that appears in Problem for each City (Name).

Example:

Area
Id   Description
1      Support
2      Finance  

City
Id      Name
1      Chicago
2      Boston

Problem
Id  City  Area  Definition
1     1     2       A
2     1     2       B
3     1     1       C
4     2     1       D

Desired Output:

 Name         Description
 Chicago        Finance
 Boston         Support

Here's what I have tried with no success :

SELECT Name,
       Description
FROM
  (SELECT *
   FROM Problem AS P,
        City AS C,
        Area AS A
   WHERE C.Id = P.City
     AND A.Id = P.Area ) AS T1
WHERE Description =
    (SELECT Description
     FROM
       (SELECT *
        FROM Problem AS P,
             City AS C,
             Area AS A
        WHERE C.Id = P.City
          AND A.Id = P.Area ) AS T2
     WHERE T1.Name = T2.Name
     GROUP BY Description
     ORDER BY Count(Name) DESC LIMIT 1 )
GROUP BY Name,
         Description

Thanks!

  • Pls post some sample input and desired output... – Teja Nov 14 '16 at 20:37
  • What is the criteria for most recent area without any timestamp field? – Teja Nov 14 '16 at 20:39
  • It's acctually the most frequent value of Area that appears in Problems for each City. It's similar to this problem: http://stackoverflow.com/questions/12235595/find-most-frequent-value-in-sql-column?rq=1 but using three tables – Trapezera Buscando Nov 14 '16 at 20:42
  • Sounds like a Top N Per Group problem. There is a figurative ton of solutions for this already. – Yuck Nov 14 '16 at 20:56
  • I'd like to see some mocked up data and expected results here. The way I see it you want to see the city.name and area.description in the results but I'm not sure what else. The phrase "most frequent value of area" makes no sense to me. If join city and area to problem as inner joins and count id we get the count for each area and city. but since it's aggregrated there is no "most fequent". – xQbert Nov 14 '16 at 21:01

2 Answers2

2

This is probably the shortest way to solve your issue:

select c.Name, a.Description
from City c
cross join Area a
where a.Id = (
    select p.Area
    from Problem p
    where p.City = c.Id
    group by p.Area
    order by count(*) desc, p.Area asc
    limit 1
)

We use a CROSS JOIN to combine every City with every Area. But we pick only the Area with the highest count in the Problem table for the given city, which is determined in the correlated subquery. If two areas have the same highest count for a city, the one coming first alphabetically will be picked (order by ... p.Area asc).

Result:

|    Name | Description |
|---------|-------------|
|  Boston |     Support |
| Chicago |     Finance |

Here's another more complex solution which includes the count.

select c.Name, a.Description, city_area_maxcount.mc as problem_count
from (
    select City, max(c) as mc
    from (
        select p.City, p.Area, count(*) as c
        from problem p
        group by p.City, p.Area
    ) city_area_count
    group by City
) city_area_maxcount
join (
    select p.City, p.Area, count(*) as c
    from problem p
    group by p.City, p.Area
) city_area_count
    on  city_area_count.City = city_area_maxcount.City
    and city_area_count.c = city_area_maxcount.mc
join City c on c.Id = city_area_count.City
join Area a on a.Id = city_area_count.Area

The subquery alisaed as city_area_maxcount is used twice here (i hope mysql can cache the result). If you think of it as a table, that would be a common find-the-row-with-top-value-per-group problem. If two areas have the same highest count for a city, both will be selected.

Result:

|    Name | Description | problem_count |
|---------|-------------|---------------|
|  Boston |     Support |             1 |
| Chicago |     Finance |             2 |

Demo: http://sqlfiddle.com/#!9/c66a5/2

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53
1

The Max For each city, and area should be

  select  C.Name, A.Description from (
    select t1.City, t1.Area, max(freq)  as max_freq
    from (
        select P.City, P.Area, count(*) as Freq
        from Problem as P 
        group by P.City, P.Area
    ) t1
  ) t2 
  INNER JOIN City AS C ON t2.City = C.Id
  INNER JOIN Area AS A ON A.Id = t2.Area
ScaisEdge
  • 131,976
  • 10
  • 91
  • 107
  • 1
    trick here is for each city, so more than 1 record will need to be returned – xQbert Nov 14 '16 at 20:44
  • @xQbert . thanks .. i have updated a new anaswer for each city (hope) – ScaisEdge Nov 14 '16 at 21:00
  • I wrote the same query but didn't post it (though I did the joins to city and area on the inner most query).. since you're only displaying name and description why do you need any of the inline views? one we aggregated by city, area max doesn't get us anything new. The problem here is the question itself doesn't make enough sense. (why couldn't we just join problem to city and area and group by c.name and a.description or do a distinct. We'd get the same results w/o all the subqueries. So we're missing something. – xQbert Nov 14 '16 at 21:12
  • @xQbert . the use of so many inline view .. is because .. i think by step when i'm tired ..When I'm a little tired and sleepy I can not abstract the whole query at once .. the use of max .. is because ( i don't know the real data) the query can return more diferent ro for city, area with different count() so the max filter this row using one only (the max) ... and last without a proper set of data and an expected result .. is impossible say if we missing somethings or we ae right .. – ScaisEdge Nov 14 '16 at 21:18
  • 1
    My point is this seems to logically be correct; but it could be achieved just as easily by: `SELECT Distinct C.name, A.Description FROM Problem INNER JOIN City C on C.ID = maxFreq.City INNER JOIN Area A on A.ID = MaxFreq.Area` but that just seems to miss what the authors after, we can't know until they provide better context to the question. – xQbert Nov 14 '16 at 21:20
  • @xQbert . maxFreq.City and maxFreq.Area could be different from maxFreq(City,Area) .. grouping together return different frequncy values respect to separated grouping – ScaisEdge Nov 14 '16 at 21:24
  • I think [this](http://sqlfiddle.com/#!9/c66a5/11) demonstrates what @xQbert means. The problem: the most inner query is grouped by `City, Area`. Thus `City, Area` is distinct in `t1`. Grouping it again by `City, Area` in `t2` doesn't make sence, because there's only one row per group. So `t1` and `t2` are equal. – Paul Spiegel Nov 14 '16 at 23:27