0

I'm working on a case that I don't know how to solve, the original table looks like this:

id typeA typeB typeC
1 3 5 1
2 2 4 5
3 2 2 1

I want to add a classification based on the value from typeA, B and C. The classification would be the one that has the highest value, and if there's a tie, I will push can't decide.

So the output looks like this.

id typeA typeB typeC classification
1 3 5 1 typeB
2 2 4 5 typeC
3 2 2 1 can't decide

I know I can use CASE WHEN in this case, but we have many "type" columns (40+), Is there a better way to do this?

Thanks a lot!

Ken White
  • 123,280
  • 14
  • 225
  • 444
Erin L
  • 95
  • 1
  • 5

1 Answers1

0

With 40+ columns, it may be a lot of work to implement the logic using a CASE WHEN eg.

CASE
    WHEN typeA > typeB AND typeA > typeC THEN 'typeA'
    ...etc
END

Another approach may be to let the database do the work with ranking and agreggrate functions as shown below. I've unpivoted the data with the first CTE all_data and added a meta column. You may also explore this other amazon athena approach if desired. I then continued to rank the data to find which type was the largest. I then determined whether there were multiple types ranked the same similar to row 3 where typeA and typeB had the same value. A case statement then decided whether I would use the actual type or can't decide. With the classification for each row id, I then used an inner join to add this new column to the final projection. See sample below:

WITH all_data AS (
    SELECT id, typeA as val, 'typeA' as type from my_table UNION ALL
    SELECT id, typeB as val, 'typeB' as type from my_table UNION ALL
    SELECT id, typeC as val, 'typeC' as type from my_table 
),
ranked_data AS (
    SELECT
        id,
        val,
        type,
        rank() over (partition by id order by val desc) as rnk
    FROM
        all_data     
),
classification AS (
   SELECT
       id,
       CASE 
           WHEN COUNT(id) > 1 THEN 'undecided'
           ELSE MAX(type)
       END as  class_type
   FROM
       ranked_data
   WHERE rnk=1
   GROUP BY id
   
)
SELECT
     m.*,
     c.class_type as classification
FROM
     my_table m
INNER JOIN
     classification c ON c.id = m.id
ggordon
  • 9,790
  • 2
  • 14
  • 27