I have a dataframe with this schema:
root
|-- customer_id: string (nullable = true)
|-- service: struct (nullable = true)
| |-- cat1: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- category: string (nullable = true)
| | | |-- match_id: string (nullable = true)
| |-- cat2: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- category: string (nullable = true)
| | | |-- match_id: string (nullable = true)
actual data looks like this:
+-----------+-------------------------------------------------------------------------------+
|customer_id|service |
+-----------+-------------------------------------------------------------------------------+
|CID1 |[[[cat1, service1], [cat1, service3]],] |
|CID2 |[[[cat1, service4],], [[cat2, service7], [cat2, service8], [cat2, service9]]] |
+-----------+-------------------------------------------------------------------------------+
I hope transformed data can look like this:
+-----------+------+--------------------------------------------------------------------------+
|customer_id| cat | service |
+-----------+------+--------------------------------------------------------------------------+
|CID1 | cat1 | [[cat1, service1], [cat1, service3]] |
|CID2 | cat1 | [[cat1, service4]] |
|CID2 | cat2 | [[cat2, service7], [cat2, service8], [cat2, service9]] |
+-----------+------+--------------------------------------------------------------------------+
or even better(but it'll be simple if I can do above form)
+-----------+------+-----------------------------------+
|customer_id| cat | service |
+-----------+------+-----------------------------------+
|CID1 | cat1 | [service1, service3] |
|CID2 | cat1 | [service4] |
|CID2 | cat2 | [service7, service8, service9]] |
+-----------+------+-----------------------------------+
where service is a concatenation of original cat1 and cat2.
And 1 thing to notice is there could be many fields under original service, meaning there could be cat1, cat2, cat3 ...
I'm new to Scala as well as Spark, and have searched for a while, but haven't seen similar examples.