I have a Dataframe (Java) that has the following simple schema . Here is an example instance:
+-------------------+
| id | key | Value |
+-------------------+
| 01 | A | John |
| 01 | B | Nick |
| 02 | A | Mary |
| 02 | B | Kathy |
| 02 | C | Sabrina|
| 03 | B | George |
+-------------------+
I would like to transform it to the following:
+------------------------------+
| id | A | B | C |
+------------------------------+
| 01 | John | Nick | null |
| 02 | Mary | Kathy | Sabrina |
| 03 | null | George | null |
+------------------------------+
I tried the pivot operator (because this is what it is actually) but worked partially because once the values A B and C become columns, the contents of the columns can only be numeric.
Dataset<Row> pivotTest2 = pivotTest.groupBy(col("id")).pivot("key").count();
What I would actually like, is in the place of the count, to put the value of the column Value, i.e., something of the form .select(col("Value"))
, or even .max("Value")
would work fine, but I cannot since Value
is not an arithmetic column.