My wide data look like this:
What I am trying to accomplish is long:
I have many Score_X's and each score has many items. So the less hard-coding (e.g. Convert data from wide format to long format in SQL) the better.
I have thought about a few ways to do this; unfortunately Hive
does not have many features that other SQL
implementations have. So first I would appreciate a solution to my problem, and secondly, if anyone knows easy ways to emulate these things in Hive
please do share with me.
- The
pivot
function, whichHive
doesn't have. - I tried to apply Joe Stefanelli's answer in Selecting all columns that start with XXX using a wildcard?.
Hive
does not haveINFORMATION_SCHEMA
either. I was told (also by stackoverflow) that I could get table metadata by first installingMySQL
and then detour throughMySQL
; I don't feel like spending that much effort on a simple task like reshaping a table... - Then I think I can combine the values of Score_A_1, Score_A_2 and Score_A_3 into one Score_A array and then do a
LATERAL VIEW EXPLODE
like in myui's answer in How to transpose/pivot data in hive?. But I Googled around and could not find a tutorial to do that.
Thanks. Your help is greatly appreciated.
Update:
So the array
function will create an array column from multiple columns. Now I am doing the LATERAL VIEW EXPLODE
; through hard-coding (i.e., non-dynamic query) I am getting what I want. However it is difficult to believe that there is not a simpler way to perform a data management task as basic as reshaping. Am I missing something fundamental about Hive
?