2

How can i split an array of values in a column into corresponding rows in Redshift using a delimiter (,) ?

Input Data:-

—————————————  
Empid | Items  
—————————————  
1001| A, B  
1002| B  
1003| C, D, E  

Required Output:-

—————————————  
Empid | Items  
—————————————  
1001| A  
1001| B  
1002| B  
1003| C  
1003| D  
1003| E  

Any help is appreciated.

Thanks

Prince Vegeta
  • 719
  • 6
  • 21
Matthew
  • 315
  • 3
  • 5
  • 16

2 Answers2

7

Based on the official docs, you can do with JOIN!

Let's say your input is:

—————————————  
empid | items  
—————————————  
1001| [A, B]  
1002| [B]  
1003| [C, D, E]
1004| []

Then you can do it as:

SELECT t.empid, items as item
FROM table_name AS t
    LEFT JOIN t.items AS items ON TRUE

This will returns:

—————————————  
empid | item 
—————————————  
1001| A  
1001| B  
1002| B  
1003| C  
1003| D  
1003| E
1004| <NULL>  
Balint
  • 343
  • 5
  • 13
  • 1
    Only if the item column is already an array, not if it's a simple `VARCHAR()` – MatBailie Sep 07 '21 at 15:36
  • 1
    Yes, but I reckon that's a different question which is already answered here: https://stackoverflow.com/questions/25112389/redshift-convert-comma-delimited-values-into-rows – Balint Sep 08 '21 at 11:32
0

Actually with the addition of stored procedures to Redshift this is possible

The procedure below accepts two parameters (source_table and target_table) assuming both table exists it transforms the data described in the question

The way it works is

  1. Reads data from source table row by row
  2. Finds out max items in Items column
  3. In a loop extracts each item
  4. Inserts id + item combo into target table
CREATE OR REPLACE PROCEDURE Array_to_Rows(source_table VARCHAR, target_table VARCHAR) 
LANGUAGE plpgsql
AS $$
DECLARE i INTEGER;
        rec RECORD;
        query VARCHAR;
        item VARCHAR;
        cnt INTEGER;
BEGIN
    query := 'SELECT * FROM ' || source_table;

    FOR rec IN EXECUTE query
    LOOP
        select INTO cnt regexp_count(rec.items,',')+1;
        i := 1; 

        << items_loop >>
        LOOP
            SELECT INTO item trim(split_part(rec.items,',',i));
            EXECUTE 'INSERT INTO ' || target_table || ' values (' || rec.Empid || ',''' || item ||''')';
            i := i + 1;
            EXIT items_loop WHEN (i > cnt);
        END LOOP;  
    END LOOP;      
END;
$$

Usage: CALL Array_to_Rows('source table name','target table name')

With test data in the question it took less than 0.2 seconds, don't know how big OPs data set is

Output is

Empid   item
1001    A
1001    B
1002    B
1003    C
1003    D
1003    E
demircioglu
  • 3,069
  • 1
  • 15
  • 22