0

I wanted to Create hive table with Json array I am facing issue with top level array. can anyone suggest me a solution. My json object looks like below

  [{"user_id": "a"," previous_user_id": "b"},{"user_id": "c"," previous_user_id": "d"},{"user_id": "e"," previous_user_id": "f"}]

Hive command to create the table:

create external table array_tmp (User array<struct<user_id: String, previous_user_id:String>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'

select user.user_id from array_tmp gives exception as

Row is not a valid JSON Object.

I have added the jar ADD JAR json-serde-1.3.8-jar-with-dependencies.jar; Any suggestion ?

Kaushik Nayak
  • 30,772
  • 5
  • 32
  • 45
Sindhu
  • 21
  • 1
  • 4

1 Answers1

1

You may need to make few changes. Here is an example

myjson/data.json

{"users":[{"user_id": "a"," previous_user_id": "b"},{"user_id": "c"," previous_user_id": "d"},{"user_id": "e"," previous_user_id": "f"}]}

Now create a Hive table

ADD JAR /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;

CREATE EXTERNAL TABLE tbl( users array<struct<user_id:string,previous_user_id:string>>) 
ROW FORMAT SERDE "org.apache.hive.hcatalog.data.JsonSerDe" 
location '/user/cloudera/myjson';

Do a select

select users.user_id from tbl;

+----------------+--+
|    user_id     |
+----------------+--+
| ["a","c","e"]  |
+----------------+--+
Bala
  • 11,068
  • 19
  • 67
  • 120
  • I get json object from different team and will not be able to change the json file. The json file has top level arrays and it is a valid json. Is there a way to handle the top level arrays in a json without changing the json object – Sindhu Mar 09 '18 at 16:36
  • @Sindhu: it is a valid json but not a properly designed one. Check https://stackoverflow.com/questions/5034444/can-json-start-with – Bala Mar 09 '18 at 19:38