Adding new columns to whether an ID has a certain values in another column

Question

I have the following table:

ID   Fruit
A    apple,orange,grape
A    apple,orange,cucumber
A    apple,orange
B    orange,grape
B    apple
B    grape
C    grape,banana
C    orange,banana
C    banana

I am hoping to add new columns to denote whether a certain ID has had specific characters in the fruit row

ID   Fruit                  apple  banana  orange
A    apple,orange,grape     yes    no      yes
A    apple,orange,cucumber  yes    no      yes
A    apple,orange           yes    no      yes
B    orange,grape           yes    no      yes
B    apple                  yes    no      yes
B    grape                  yes    no      yes
C    grape,banana           no     yes     yes
C    orange,banana          no     yes     yes
C    banana                 no     yes     yes

I asked a similar question a while back here.

SELECT a.*,
       case when count(case when a.Fruit like 'apple' then 1 end) over (partition by ID) > 0 then 'yes' else 'no' end [Apple],
       case when count(case when a.Fruit like 'banana' then 1 end) over (partition by ID) > 0 then 'yes' else 'no' end [Banana],
       case when count(case when a.Fruit like 'orange' then 1 end) over (partition by ID) > 0 then 'yes' else 'no' end [Orange]
FROM MyTable a

This, however, is being very resource heavy and slow, given the size of my tables. In terms of performance, is there a more efficient solution to this with coalesce or join or any other method?

first don't store delimited data see https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad — nbk, Oct 15 '22 at 23:28
if you are using Snowflake, it is completely valid to usefull to store CSV, JSON, XML in the database. Now if you are having performance problems, you will get some gain from converting to natural columns, but like ALL advice, you should improve the SQL when you find it performs poorly. — Simeon Pilgrim, Oct 16 '22 at 02:51

Simeon Pilgrim · Accepted Answer · 2022-10-16T02:55:19.997

so your SQL needs some tweak to work in Snowflake, the [square bracket] alias form is only valid in T-SQL.

SELECT a.*,
       case when count(case when a.Fruit like 'apple' then 1 end) over (partition by ID) > 0 then 'yes' else 'no' end as "Apple",
       case when count(case when a.Fruit like 'banana' then 1 end) over (partition by ID) > 0 then 'yes' else 'no' end as "Banana",
       case when count(case when a.Fruit like 'orange' then 1 end) over (partition by ID) > 0 then 'yes' else 'no' end as "Orange"
FROM MyTable as a

in fact you can use COUNT_IF and IFF to tighten up that to:

SELECT a.*,
       iff(count_if(a.Fruit like 'apple') over (partition by a.ID) > 0, 'yes', 'no') as "Apple",
       iff(count_if(a.Fruit like 'banana') over (partition by a.ID) > 0, 'yes', 'no') as "Banana",
       iff(count_if(a.Fruit like 'orange') over (partition by a.ID) > 0, 'yes', 'no') as "Orange"
FROM MyTable as a

But this is SLOW, because you are do the count for every row even though the answer will be the same for all rows of A, B, etc. Thus you can join the aggregated data:

select *
FROM MyTable as a
natural join (
    SELECT c.id,
       iff(count_if(c.Fruit like 'apple') > 0, 'yes', 'no') as "Apple",
       iff(count_if(c.Fruit like 'banana')  > 0, 'yes', 'no') as "Banana",
       iff(count_if(c.Fruit like 'orange') > 0, 'yes', 'no') as "Orange"
    FROM MyTable as c
    GROUP BY 1
) as b

ID	FRUIT	Apple	Banana	Orange
A	apple,orange,grape	no	no	no
A	apple,orange,cucumber	no	no	no
A	apple,orange	no	no	no
B	orange,grape	yes	no	no
B	apple	yes	no	no
B	grape	yes	no	no
C	grape,banana	no	yes	no
C	orange,banana	no	yes	no
C	banana	no	yes	no

Adding new columns to whether an ID has a certain values in another column

1 Answers1

Linked