This is the familiar way I'd do it in SAS.
data long1 ;
input famid year faminc ;
cards ;
1 96 40000
1 97 40500
1 98 41000
2 96 45000
2 97 45400
2 98 45800
3 96 75000
3 97 76000
3 98 77000
;
run;
proc transpose data=long1 out=wide1 prefix=faminc;
by famid ;
id year;
var faminc;
run;
proc print data = wide1;
run;
Obs famid _NAME_ faminc96 faminc97 faminc98
1 1 faminc 40000 40500 41000
2 2 faminc 45000 45400 45800
3 3 faminc 75000 76000 77000
How can I achieve something similar using either Hive/Impala or Spark?
The key point here is that the unique row values that will become columns after being transposed is not defined ahead of time.
In essence, I cannot afford to use a series of CASE WHEN year='x' then faminc else 0
because one time I could have certain year values and sometimes I could have other, and not all famid will have all year.