Spark - Transpose DataFrame Columns

Question

I have a table that looks like the following:

ID     Type      5m      10m     15m
1      A         3       9       13
1      B         7       8       22
1      C         5       11      13
2      A         1       3       20
2      B         16      17      30
...

If possible, I would like to create new columns in the following format:

ID     A_5m     A_10m     A_15m    B_5m    B_10m     B_15m     C_5m     C_10m     C_15m

I am currently referencing the following SO: How to transpose/pivot the rows data to column in Spark Scala?

Its good for creating the new columns: A B, but I am lost when it comes to creating the types plus the distance.

Any ideas?

score 0 · Accepted Answer · answered May 14 '18 at 17:42

Its good for creating the new columns: A B, but I am lost when it comes to creating the types plus the distance.

It is not different. You can apply a number of aggregates in single pivot:

val df = Seq(
   (1, "A", 3 , 9 , 13), (1, "B", 7 , 8 , 22),(1, "C", 5 , 11, 13),
   (2, "A", 1 , 3 , 20), (2, "B", 16, 17, 30)
).toDF("id", "type", "5m", "10m", "15m")

df.groupBy("id").pivot("type").agg(
  first("5m") as "5m", first("10m") as "10m", first("15m") as "15m"
).show
+---+----+-----+-----+----+-----+-----+----+-----+-----+ 
| id|A_5m|A_10m|A_15m|B_5m|B_10m|B_15m|C_5m|C_10m|C_15m|
+---+----+-----+-----+----+-----+-----+----+-----+-----+
|  1|   3|    9|   13|   7|    8|   22|   5|   11|   13|
|  2|   1|    3|   20|  16|   17|   30|null| null| null|
+---+----+-----+-----+----+-----+-----+----+-----+-----+

Spark will automatically generate names based on base name and level.

Spark - Transpose DataFrame Columns

1 Answers1