-1

This is the familiar way I'd do it in SAS.

data long1 ; 
  input famid year faminc ; 
cards ; 
1 96 40000 
1 97 40500 
1 98 41000 
2 96 45000 
2 97 45400 
2 98 45800 
3 96 75000 
3 97 76000 
3 98 77000 
; 
run;

proc transpose data=long1 out=wide1 prefix=faminc;
    by famid ;
    id year;
    var faminc;
run;

proc print data = wide1;
run;

Obs    famid    _NAME_    faminc96    faminc97    faminc98

 1       1      faminc      40000       40500       41000
 2       2      faminc      45000       45400       45800
 3       3      faminc      75000       76000       77000

How can I achieve something similar using either Hive/Impala or Spark? The key point here is that the unique row values that will become columns after being transposed is not defined ahead of time. In essence, I cannot afford to use a series of CASE WHEN year='x' then faminc else 0 because one time I could have certain year values and sometimes I could have other, and not all famid will have all year.

Joe
  • 62,789
  • 6
  • 49
  • 67
ADJ
  • 4,892
  • 10
  • 50
  • 83

1 Answers1

0

I found this one

How to pivot DataFrame?

A pivot operator has been added to the Spark dataframe API, and is part of Spark 1.6.

See https://github.com/apache/spark/pull/7841 for details.

Community
  • 1
  • 1
Tony
  • 135
  • 9