I have a Spark SQL query (using Scala as the language) which gives output as the following table where {name, type, category} is unique. Only type has limited values (due to 5-6 unique types).
name | type | category | value |
---|---|---|---|
First | type1 | cat1 | value1 |
First | type1 | cat2 | value2 |
First | type1 | cat3 | value3 |
First | type2 | cat1 | value1 |
First | type2 | cat5 | value4 |
Second | type1 | cat1 | value5 |
Second | type1 | cat4 | value5 |
I'm looking for a way to convert it into a JSON with Spark such that output is something like this, basically get the output for every name & type combination.
[
{
"name": "First",
"type": "type1",
"result": {
"cat1": "value1",
"cat2": "value2",
"cat3": "value3"
}
},
{
"name": "First",
"type": "type2",
"result": {
"cat1": "value1",
"cat5": "value4"
}
},
{
"name": "Second",
"type": "type1",
"result": {
"cat1": "value5",
"cat4": "value5"
}
}
]
Is this possible via Spark scala? Any pointers or references would be really helpful. Eventually I have to write the JSON output to S3, so if this is possible during write then it will also be okay.