2

I'm trying to create a table in hive based on some JSON data that I have in hadoop. The hard part is that I have a nested JSON where the data is not cleanly defined in key:value pairs:

{  
   "trafficSource":{  
      "source":"(direct)",
      "medium":"(none)"
   },
   "device":{  
      "browser":"(not set)",
      "browserVersion":"(not set)",
      "operatingSystem":"(not set)",
      "operatingSystemVersion":"(not set)",
      "isMobile":false,
      "flashVersion":"(not set)",
      "javaEnabled":false,
      "language":"(not set)",
      "screenColors":"(not set)",
      "screenResolution":"(not set)",
      "deviceCategory":"desktop"
   }

   "hits":[  
      {  
         "customDimensions":[  
            {  
               "index":"1",
               "value":"2"
            },
            {  
               "index":"2",
               "value":"0"
            }
         ],
      }
   ],
}

I'd like to create one column in my hive table for each value in "hits.customDimensions" of the JSON. Each array contains an index and a value for that index. For example, a column for index=1 would be named "User_ID" and the value would come from the "value" parameter in that array.

Is there an easy way to create this HIVE table?

Thanks!

Tim
  • 21
  • 1
  • 2

1 Answers1

0

You can use the json_tuple UDF in hive to parse through the attributes in the json, but I couldn't figure out a way to parse the nested json's.

The below would help you with parsing json :

How do you make a HIVE table out of JSON data?

To have the nested json's put into the Hive table, I would suggest you parse the json files using python & Hadoop streaming and convert them into flattened structures. It would be pretty easy going ahead from there.

Community
  • 1
  • 1
Akarsh
  • 389
  • 5
  • 15