Can I use values from Spark dataframe to create a new one dynamically?

Question

I have a Spark dataframe(oldDF) that looks like:

Id    | Category | Count
898989  5          12
676767  12         1
334344  3          2
676767  13         3

And I want to create a new dataframe with column names of Category with value of Count grouped by Id.

The reason why I can't specify schema or would rather not is because the categories change a lot.Is there any way to do it dynamically?

An output I would like to see as Dataframe from the one above:

Id     | V3 | V5 | V12 | V13
898989   0    12    0     0
676767   0    0     1     3
334344   2    0     0     0

There is a typo in your code, round brackets are not closed properly. — Anas, Jan 11 '16 at 13:48
Can you please elaborate the actual use case? What do you mean by categories change a lot? — Anas, Jan 11 '16 at 14:40
can you please provide an example of output DataFrame that your are looking for? you can change your oldDF and add some more data to it, and then make an example of the output DataFrame. — Rami, Jan 11 '16 at 14:43
The categories are never the same for different models so I would have to write 30-40 different schemas as far as I understand right now. — Abdul Merzoug, Jan 11 '16 at 14:59
I think @zero323 answered a similar question before. But I'm on my phone I can't search for it now... — eliasah, Jan 11 '16 at 20:37

score 0 · Answer 1 · edited May 23 '17 at 12:04

0

You need to do your groupby operation first, then you can apply implement a pivot operation as explained here

edited May 23 '17 at 12:04

Community

answered Jan 11 '16 at 15:00

Rami

score 0 · Accepted Answer · answered Jan 11 '16 at 22:02

0

With Spark 1.6

oldDf.groupBy("Id").pivot("category").sum("count)

answered Jan 11 '16 at 22:02

2 Answers2