How can I define the schema for a json array so that I can explode it into rows?
I have a UDF which returns a string (json array), I want to explode the item in array into rows and then save it.
sample json
[{"name":"name1","id":"1"}, {"name":"name2","id":"2"}]
I defined by Schema as
// schema of each item in the array
StructType arrayItemDataType= new StructType(new structfiled("name"), new structfiled(id));
var rootType = new ArrayType(itemDataType); // array of items
var field = new StructField(name: "MyJson", dataType: rootType, isNullable: false);
StructType schema = new StructType(new structField(field));
df.select(from_json('mycol', schema))
after explode tt creates schema like
root
|-- col: struct (nullable = true)
| |-- name: string
| |-- id: string
I am running on local cluster and writing to csv, I was expecting after explode it should have dataframe with 2 columns name and id and i can write all rows in csv. When I run it is not creating df schema as name,id and fails to write to csv with message "csv doesn't support struct<"