Converting Values to Columns in Spark Dataset (Convert Key & Value pair of columns to regular columns)

Question

I have a Dataframe (Java) that has the following simple schema . Here is an example instance:

+-------------------+
| id | key | Value  |
+-------------------+
| 01 | A   | John   |
| 01 | B   | Nick   |
| 02 | A   | Mary   |
| 02 | B   | Kathy  |
| 02 | C   | Sabrina|
| 03 | B   | George |
+-------------------+

I would like to transform it to the following:

+------------------------------+
| id | A    | B      | C       |
+------------------------------+
| 01 | John | Nick   | null    |
| 02 | Mary | Kathy  | Sabrina |
| 03 | null | George | null    |
+------------------------------+

I tried the pivot operator (because this is what it is actually) but worked partially because once the values A B and C become columns, the contents of the columns can only be numeric.

Dataset<Row> pivotTest2 = pivotTest.groupBy(col("id")).pivot("key").count();

What I would actually like, is in the place of the count, to put the value of the column Value, i.e., something of the form .select(col("Value")), or even .max("Value") would work fine, but I cannot since Value is not an arithmetic column.

You need `first` instead of `max` or `count` – philantrovert Jun 06 '18 at 14:10 — philantrovert, Jun 06 '18 at 14:10

score 1 · Answer 1 · answered Jun 06 '18 at 14:13

Doing the following should work for you

import static org.apache.spark.sql.functions.*;

Dataset<Row> pivotTest2 = pivotTest.groupBy(col("id")).pivot("key").agg(first("Value"));
pivotTest2.show(false);

which should give you

+---+----+------+-------+
|id |A   |B     |C      |
+---+----+------+-------+
|01 |John|Nick  |null   |
|03 |null|George|null   |
|02 |Mary|Kathy |Sabrina|
+---+----+------+-------+

Converting Values to Columns in Spark Dataset (Convert Key & Value pair of columns to regular columns)

1 Answers1