0

I have the following table:

ID   Fruit
A    apple,orange,grape
A    apple,orange,cucumber
A    apple,orange
B    orange,grape
B    apple
B    grape
C    grape,banana
C    orange,banana
C    banana

I am hoping to add new columns to denote whether a certain ID has had specific characters in the fruit row

ID   Fruit                  apple  banana  orange
A    apple,orange,grape     yes    no      yes
A    apple,orange,cucumber  yes    no      yes
A    apple,orange           yes    no      yes
B    orange,grape           yes    no      yes
B    apple                  yes    no      yes
B    grape                  yes    no      yes
C    grape,banana           no     yes     yes
C    orange,banana          no     yes     yes
C    banana                 no     yes     yes

I asked a similar question a while back here.

SELECT a.*,
       case when count(case when a.Fruit like 'apple' then 1 end) over (partition by ID) > 0 then 'yes' else 'no' end [Apple],
       case when count(case when a.Fruit like 'banana' then 1 end) over (partition by ID) > 0 then 'yes' else 'no' end [Banana],
       case when count(case when a.Fruit like 'orange' then 1 end) over (partition by ID) > 0 then 'yes' else 'no' end [Orange]
FROM MyTable a

This, however, is being very resource heavy and slow, given the size of my tables. In terms of performance, is there a more efficient solution to this with coalesce or join or any other method?

Shahin
  • 1,196
  • 1
  • 8
  • 15
  • 2
    first don't store delimited data see https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad – nbk Oct 15 '22 at 23:28
  • Please include the DBMS you're using. – John K. Oct 16 '22 at 01:36
  • if you are using Snowflake, it is completely valid to usefull to store CSV, JSON, XML in the database. Now if you are having performance problems, you will get some gain from converting to natural columns, but like ALL advice, you should improve the SQL when you find it performs poorly. – Simeon Pilgrim Oct 16 '22 at 02:51

1 Answers1

1

so your SQL needs some tweak to work in Snowflake, the [square bracket] alias form is only valid in T-SQL.

SELECT a.*,
       case when count(case when a.Fruit like 'apple' then 1 end) over (partition by ID) > 0 then 'yes' else 'no' end as "Apple",
       case when count(case when a.Fruit like 'banana' then 1 end) over (partition by ID) > 0 then 'yes' else 'no' end as "Banana",
       case when count(case when a.Fruit like 'orange' then 1 end) over (partition by ID) > 0 then 'yes' else 'no' end as "Orange"
FROM MyTable as a

in fact you can use COUNT_IF and IFF to tighten up that to:

SELECT a.*,
       iff(count_if(a.Fruit like 'apple') over (partition by a.ID) > 0, 'yes', 'no') as "Apple",
       iff(count_if(a.Fruit like 'banana') over (partition by a.ID) > 0, 'yes', 'no') as "Banana",
       iff(count_if(a.Fruit like 'orange') over (partition by a.ID) > 0, 'yes', 'no') as "Orange"
FROM MyTable as a

But this is SLOW, because you are do the count for every row even though the answer will be the same for all rows of A, B, etc. Thus you can join the aggregated data:

select *
FROM MyTable as a
natural join (
    SELECT c.id,
       iff(count_if(c.Fruit like 'apple') > 0, 'yes', 'no') as "Apple",
       iff(count_if(c.Fruit like 'banana')  > 0, 'yes', 'no') as "Banana",
       iff(count_if(c.Fruit like 'orange') > 0, 'yes', 'no') as "Orange"
    FROM MyTable as c
    GROUP BY 1
) as b
ID FRUIT Apple Banana Orange
A apple,orange,grape no no no
A apple,orange,cucumber no no no
A apple,orange no no no
B orange,grape yes no no
B apple yes no no
B grape yes no no
C grape,banana no yes no
C orange,banana no yes no
C banana no yes no
Simeon Pilgrim
  • 22,906
  • 3
  • 32
  • 45