How to transpose data in Hive, Impala or Spark?

Question

This is the familiar way I'd do it in SAS.

data long1 ; 
  input famid year faminc ; 
cards ; 
1 96 40000 
1 97 40500 
1 98 41000 
2 96 45000 
2 97 45400 
2 98 45800 
3 96 75000 
3 97 76000 
3 98 77000 
; 
run;

proc transpose data=long1 out=wide1 prefix=faminc;
    by famid ;
    id year;
    var faminc;
run;

proc print data = wide1;
run;

Obs    famid    _NAME_    faminc96    faminc97    faminc98

 1       1      faminc      40000       40500       41000
 2       2      faminc      45000       45400       45800
 3       3      faminc      75000       76000       77000

How can I achieve something similar using either Hive/Impala or Spark? The key point here is that the unique row values that will become columns after being transposed is not defined ahead of time. In essence, I cannot afford to use a series of CASE WHEN year='x' then faminc else 0 because one time I could have certain year values and sometimes I could have other, and not all famid will have all year.

score 0 · Answer 1 · edited May 23 '17 at 11:44

0

I found this one

How to pivot DataFrame?

A pivot operator has been added to the Spark dataframe API, and is part of Spark 1.6.

See https://github.com/apache/spark/pull/7841 for details.

edited May 23 '17 at 11:44

Community

1
1

answered Dec 16 '15 at 20:25

Tony

135
9

How to transpose data in Hive, Impala or Spark?

1 Answers1