1

I have a table called Product. It has column called colors. It contains data as follows:

enter image description here

The data type for colors is varchar. The database contains large product values (almost 2 hundred) , and same applies to colors. Now I need to write an sql query script to find top 2 products having largest variation in colors.

manoj adhikari
  • 311
  • 2
  • 13

2 Answers2

3

You need to split and count the colors for each product(brand):

SELECT products, colors
FROM (
   SELECT 
      *,
      DENSE_RANK() OVER (ORDER BY (
         SELECT COUNT(DISTINCT LTRIM(RTRIM([value]))) 
         FROM STRING_SPLIT(SUBSTRING(colors, 2, LEN(colors) - 2), ',')
         ) DESC
      ) AS rank
   -- FROM YourTable
   FROM (VALUES
      ('adidas', '{red, blue, black}'),
      ('puma', '{red, green, blue, orange}'),
      ('nike', '{red, green}')
   ) product (products, colors)
) t
WHERE rank <= 2

Result:

products colors 
----------------------------------
puma     {red, green, blue, orange}
adidas   {red, blue, black}

If you need to count the colors, simply add one additional column:

SELECT products, colors
FROM (
   SELECT 
      *,
      (
         SELECT COUNT(DISTINCT LTRIM(RTRIM([value]))) 
         FROM STRING_SPLIT(SUBSTRING(colors, 2, LEN(colors) - 2), ',')
      ) AS [count]
   FROM (VALUES
      ('adidas', '{red, blue, black, blue, red}'),
      ('puma', '{red, green, blue}'),
      ('nike', '{red, green}')
   ) product (products, colors)
) t
WHERE [count] > 2
Zhorov
  • 28,486
  • 6
  • 27
  • 52
  • the database is large and contains large product values (almost 2 hundred) , and same applies to colors. How can we optimize FROM (VALUES ('')) portion, since I cannot manually do that. – manoj adhikari Dec 23 '21 at 09:04
  • 2
    @manojadhikari that is the sample data you gave us, you wouldn't use a `VALUES` table construct you would use *your table* obviously. – Thom A Dec 23 '21 at 09:09
  • @Larnu Thanks !! , since I am new on this and learning can yo refactor that please using table name. – manoj adhikari Dec 23 '21 at 09:13
  • @Zhorov Thanks !! , since I am new on this and learning can yo refactor that please using table name. – manoj adhikari Dec 23 '21 at 09:13
  • 2
    It's not my place to do that some one's answer, @manojadhikari , but it would just be `FROM dbo.YourTable`. – Thom A Dec 23 '21 at 09:15
  • @Larnu How do I find products that have variation more than 2 – manoj adhikari Dec 23 '21 at 09:22
  • 1
    That's a different question, @manojadhikari, however, I suggest having a look at [Turning a Comma Separated string into individual rows](https://stackoverflow.com/questions/5493510/turning-a-comma-separated-string-into-individual-rows) as a starting point. – Thom A Dec 23 '21 at 09:24
1
WITH CTE(PRODUCT,COLOR) AS
(
  SELECT 'ADIDAS','{RED,BLUE}' UNION ALL
  SELECT 'REEBOK','{PURPLE,GRAY,BLACK,ORANGE}' UNION ALL
  SELECT 'NIKE','{}'UNION ALL
  SELECT 'PUMA','{RED,BLACK,ORANGE}'
)
SELECT TOP 2 C.PRODUCT,C.COLOR,LEN(C.COLOR)XX
FROM CTE AS C
ORDER BY XX DESC

Could you please try if the above is suitable for you

Sergey
  • 4,719
  • 1
  • 6
  • 11
  • I don't think this is suitable for large database where values of colors and products are in thousands. – manoj adhikari Dec 23 '21 at 09:01
  • 2
    No, the "best" solution would be to fix and normalise your design for the most performant solution, @manojadhikari . – Thom A Dec 23 '21 at 09:14
  • 1
    not sure this would be reliable, eg {mustard yellow, chartreuse}, {red,blue,pink} – Stu Dec 23 '21 at 09:23