I have a data frame with a format like:
id | product
-------------
1 | A
1 | B
1 | C
2 | A
3 | A
3 | C
What I want to accomplish is a 2 column data frame output where there is one row per ID with an array for every product owned by that ID. I tried some code with mapPartitions() but I get errors about not being able to infer schema. I know I have to yield something back in the map function, but I can't seem to figure it out.
Using Spark 1.6
Edit
In case anyone else has this question, I actually went with the solution here using combineByKey(): https://stackoverflow.com/a/27043562/1181412
It gave more flexibility to work the fields in a more granular way