1

I'm attempting to create an SQL query that retrieves the total_cost for every row in a table. Alongside that, I also need to collect the most dominant value for both columnA and columnB, with their respective values.

For example, with the following table contents:

cost columnA columnB target
250 Foo Bar XYZ
200 Foo Bar XYZ
150 Bar Bar ABC
250 Foo Bar ABC

The result would need to be:

total_cost columnA_dominant columnB_dominant columnA_value columnB_value
850 Foo Bar 250 400

Now I can get as far as calculating the total cost - that's no issue. I can also get the most dominant value for columnA using this answer. But after this, I'm not sure how to also get the dominant value for columnB and the values too.

This is my current SQL:

SELECT 
    SUM(`cost`) AS `total_cost`,
    COUNT(`columnA`) AS `columnA_dominant` 
FROM `table`
GROUP BY `columnA_dominant`
ORDER BY `columnA_dominant` DESC
WHERE `target` = "ABC"

UPDATE: Thanks to @Barmar for the idea of using a subquery, I managed to get the dominant values for columnA and columnB:

SELECT 
    -- Retrieve total cost.
    SUM(`cost`) AS `total_cost`,
    -- Get dominant values.
    (
        SELECT `columnA`
        FROM `table`
        GROUP BY `columnA`
        ORDER BY COUNT(*) DESC
        LIMIT 1
    ) AS `columnA_dominant`,
    (
        SELECT `columnB`
        FROM `table`
        GROUP BY `columnB`
        ORDER BY COUNT(*) DESC
        LIMIT 1
    ) AS `columnB_dominant`
FROM `table`
WHERE `target` = "XYZ"

However, I'm still having issues figuring out how to calculate the respective values.

GROVER.
  • 4,071
  • 2
  • 19
  • 66
  • 2
    You need a subquery for each column to get the most common value of that column. Join all these subqueries to get all of them in a single row. – Barmar May 13 '22 at 00:10
  • @Barmar See update. – GROVER. May 13 '22 at 01:10
  • Put the queries that get the most dominant value in a subquery. Then you can do `SUM(columnA = columnA_dominant) / COUNT(*) AS columnA_percentage` – Barmar May 13 '22 at 01:16

2 Answers2

3

You might get close, if we want to get percentage values we can try to add COUNT(*) at subquery to get max count by columnA and columnB then do division by total count

SELECT 
    SUM(cost),
    (
        SELECT tt.columnA
        FROM T tt
        GROUP BY tt.columnA
        ORDER BY COUNT(*) DESC
        LIMIT 1
    )  AS columnA_dominant,
    (
        SELECT tt.columnB
        FROM T tt
        GROUP BY tt.columnB
        ORDER BY COUNT(*) DESC
        LIMIT 1
    )  AS columnB_dominant,
    (
        SELECT COUNT(*)
        FROM T tt
        GROUP BY tt.columnA
        ORDER BY COUNT(*) DESC
        LIMIT 1
    ) / COUNT(*) AS columnA_percentage,
    (
        SELECT COUNT(*)
        FROM T tt
        GROUP BY tt.columnB
        ORDER BY COUNT(*) DESC
        LIMIT 1
    ) / COUNT(*) AS columnB_percentage
FROM T t1

If your MySQL version supports the window function, there is another way which reduce table scan might get better performance than a correlated subquery

SELECT SUM(cost) OVER(),
       FIRST_VALUE(columnA) OVER (ORDER BY counter1 DESC) columnA_dominant,
       FIRST_VALUE(columnB) OVER (ORDER BY counter2 DESC) columnB_dominant,
       FIRST_VALUE(counter1) OVER (ORDER BY counter1 DESC) / COUNT(*) OVER() columnA_percentage,
       FIRST_VALUE(counter2) OVER (ORDER BY counter2 DESC) / COUNT(*) OVER() columnB_percentage
FROM (
  SELECT *,
         COUNT(*) OVER (PARTITION BY columnA) counter1,
         COUNT(*) OVER (PARTITION BY columnB) counter2  
  FROM T
) t1
LIMIT 1

sqlfiddle

D-Shih
  • 44,943
  • 6
  • 31
  • 51
  • @GROVER. We can just add `SUM` instead of `COUNT` from the query https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=fec41c4bd40439fac7532fbdd5c8c49b – D-Shih May 13 '22 at 03:13
  • If I add a where condition to the main query, for some reason the subqueries return the incorrect results? Ie. `WHERE `columnC` = "XYZ"`. Do I need to add the same where condition to ALL the subqueries? – GROVER. May 13 '22 at 05:08
  • Yes, You need to add conditions for that or compare `columnC` which might equal the subquery table and main table. – D-Shih May 13 '22 at 05:15
  • See updated question. Might make more sense lol :-) – GROVER. May 13 '22 at 05:18
1

try this query

select sum(cost) as total_cost,p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
from get_common,(
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
)p,
(select top 1 columnB,columnB_percentage
from (
select columnB,count(columnB) as count_columnB, cast(count(columnB) as float)/(select count(columnB) from get_common) as columnB_percentage
from get_common
group by columnB) t
order by count_columnB desc)q
group by p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage

so if you want to get the percent and dominant value you must make their own query like this

select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc

then you can join with the sum query to get all value you want

hope this can help you

kebis
  • 66
  • 2
  • Sorry, I realised the way I was going about things was incorrect. I was meant to get their respective `values` not their percentages. See updated question. – GROVER. May 13 '22 at 03:08