Actually with the addition of stored procedures to Redshift this is possible
The procedure below accepts two parameters (source_table
and target_table
)
assuming both table exists it transforms the data described in the question
The way it works is
- Reads data from source table row by row
- Finds out max items in Items column
- In a loop extracts each item
- Inserts id + item combo into target table
CREATE OR REPLACE PROCEDURE Array_to_Rows(source_table VARCHAR, target_table VARCHAR)
LANGUAGE plpgsql
AS $$
DECLARE i INTEGER;
rec RECORD;
query VARCHAR;
item VARCHAR;
cnt INTEGER;
BEGIN
query := 'SELECT * FROM ' || source_table;
FOR rec IN EXECUTE query
LOOP
select INTO cnt regexp_count(rec.items,',')+1;
i := 1;
<< items_loop >>
LOOP
SELECT INTO item trim(split_part(rec.items,',',i));
EXECUTE 'INSERT INTO ' || target_table || ' values (' || rec.Empid || ',''' || item ||''')';
i := i + 1;
EXIT items_loop WHEN (i > cnt);
END LOOP;
END LOOP;
END;
$$
Usage: CALL Array_to_Rows('source table name
','target table name
')
With test data in the question it took less than 0.2 seconds, don't know how big OPs data set is
Output is
Empid item
1001 A
1001 B
1002 B
1003 C
1003 D
1003 E