I have a dataset which has a combination of simple(String) and complex data types, delimited with pipe.
1111|1234567891011|ABC11|JOSE|"linkEnrollment": {"Group": [{"action": "ADD","groupType": "ROSS","groupId": "GRP-1","isValid": "Y"},{"action": "ADD","groupType": "CROSS","groupId": "GRP-2","isValid": " "}]}
2222|9876543256827|ABC22|JACK|"linkEnrollment": {"Group": [{"action": "DEL","groupType": "ROCK","groupId": "GRP-7","isValid": "N"}]}
Corresponding columns are:
UUID(String)|PID(String)|DEVID(String)|FIRSTNAME(String)|LINK(String which is a JSON)
My requirement is i need to load this data into a Hive Table using Spark Java. I need to know:
- How to read the above data and convert into a dataframe(using StructType schema) to insert into a Hive table.
- How to load the LINK column data into Hive table, what will be its data type in the table.