I am trying to read data from json files in S3 into my Hive table. If the column names and json keys are same its all loading properly.
But now I want to read data in such a way that the nested json values goes into specific columns (For eg: for json
{"data1": {"key1": "value1"}}
I want the data1.key1
value to go into column named data1_key1
; which I understand is achievable with SERDEPROPERTIES
.
My next problem is there can be multiple json keys and I want the key names to be column values in my Hive table.
Also, depending upon those keys, the keys that go into other columns will also change.
For eg my json files will be either:
{"data1" : {"key1":"value1"}}
or
{"data2" : { "key2" : "value2"}}
This need to create a table as below:
col1 col2
data1 value1
data2 value2
Is this possible? If so how should it be done?