1

I have a requirement that needs columns with values to be transposed into rows. For instance refer to the table below:

cust:

cust_id | cover1 | cover2 | cover3
1234 | 'PAG' | Null | 'TDE'
5678 | Null | 'GAP' | Null

Given the above table, we have to find out which columns have a value and if there is a value in that column then there should be a row created. For e.g.

cust_id | cover
1234 | 'PAG'
1234 | 'TDE'
5678 | 'GAP'

For the customer 1234 only cover1 and cover 3 are populated hence there will be 2 records created. For 5678 cover1 & cover2 are Nulls hence, only 1 record for cover_3 needs to be created.

I could apply a simple approach like below. But I was wondering if there is an elegant approach and a smarter solution to this.

select cust_id, cover1 AS cover where cover1 IS Not Null
UNION ALL
select cust_id, cover2 AS cover where cover2 IS Not Null
UNION ALL
select cust_id, cover3 AS cover where cover3 IS Not Null

Please share your thoughts. We use Spark-SQL 2.4

Thanks

marie20
  • 723
  • 11
  • 30

2 Answers2

0

Maybe like this:

SELECT CUST_ID, COVER FROM(
   SELECT * FROM test
   UNPIVOT(
          COVER
          for COVER_C
          IN(
             COVER1,
             COVER2,
             COVER3))
);

Here is the DEMO

VBoka
  • 8,995
  • 3
  • 16
  • 24
  • Its not working. Error is - `org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'FROM' expecting (line 1, pos 17)` – marie20 Oct 29 '19 at 09:47
  • @marie20 have you seen my example ? In my example I have entered the data you have given us and all works. Please can you tell me what is different in your example from the example in the demo ? Do you use Oracle Database ? Thanks! – VBoka Oct 29 '19 at 09:59
  • Hi, thanks for replying back... i don't have access to Oracle. We work on Spark-SQL. It'll be great if you could please provide the Spark-SQL variant of your query – marie20 Oct 29 '19 at 11:41
  • Maybe this will help: https://stackoverflow.com/questions/42465568/unpivot-in-spark-sql-pyspark – VBoka Oct 29 '19 at 12:45
0

The requirement could be met by using Spark's stack() with the %sql mode:

create temporary view cust 
as 
select * from values (12345, 'PAG', Null, 'TDE'), 
                     (5678, Null, 'GAP', Null) as (cust_id, cover1, cover2, cover3);

select * from 
(select cust_id, 
stack(3, 'cover1', cover1, 
         'cover2', cover2, 
         'cover3', cover3) as (cover_nm, cover)
from cust
)
where cover is not null;
marie20
  • 723
  • 11
  • 30