-2

I am trying to generate a new row for each value in col2. As the value is in string format. They are either comma or space separated.

col1            col2                  col3
1               ab cd ef              Denver
2               gh                    Chicago
3               jk,lm,kn,ut,zx        Dallas

Expected output

col1           col2        col3
1               ab         Denver
1               cd         Denver
1               ef         Denver
2               gh         Chicago
3               jk         Dallas
3               lm         Dallas
3               kn         Dallas
3               ut         Dallas
3               zx         Dallas

How can we achieve it in Redshift SQL

Codegator
  • 459
  • 7
  • 28
  • Please consider all the numerous solution options here: https://stackoverflow.com/questions/25112389/redshift-convert-comma-delimited-values-into-rows There you can you choose the "simplest". – Paul Maxwell Jun 01 '23 at 01:13
  • See [split_to_array](https://docs.aws.amazon.com/redshift/latest/dg/split_to_array.html) in documentation but there is no `unnest` it seems but once you have col2 in an array you should be able to use JOIN to produce rows. I have no access to redshift so I cannot test any use of that function. – Paul Maxwell Jun 01 '23 at 01:20

1 Answers1

-1

You can extract each value from col2 using split_part() but to get the required rows you will need a set of rows that have the number 1 to N that you can use as a parameter into that function. One way to do that is through a recursive CTE:

WITH numbers
AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1
    FROM numbers
    WHERE n < 100 -- choose this max number to suit your situation
    )
SELECT
      col1
    , split_part(replace(col2, ' ', ','), ',', n) AS col2 -- for space or comma separators
    , col3
FROM table_name
CROSS JOIN numbers
WHERE split_part(col2, ',', n) IS NOT NULL
    AND split_part(col2, ',', n) != '';

The cross join forces multiple rows to be generated, and on each row the number (n) is passed into split_part to get the wanted col2 values per row. The where clause limits the result to only the rows with col2 values. Note: 100 used above is a guess, use whatever you believe is the maximum number of values that can be in col2.

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
  • The actual table has 500k of records. Please suggest a simpler solution for it. – Codegator May 31 '23 at 17:04
  • The number of rows is actually irrelevant, and not at all related to "simplicity" of a sql query. You may be concerned with performance, where you should be concerned with the efficiency of a query. The solution provided above will work for 5 or 500,000 rows. I did find [split_to_array](https://docs.aws.amazon.com/redshift/latest/dg/split_to_array.html) in documentation but there is no `unnest` so you are on your own with that as I do not have access to redshift and cannot test how to work with that function. – Paul Maxwell Jun 01 '23 at 01:11
  • Here is a different approach to generating the numbers for using split_part - https://stackoverflow.com/a/31998832/2067753 - it does propose a method for calculating the max number you will need - but this adds to processing time – Paul Maxwell Jun 01 '23 at 01:27