0

I have a dataframe as shown below

+----------+----+----+----+
|      date|col1|col2|col3|
+----------+----+----+----+
|2021-05-01|  20|  30|  40|
|2021-05-02| 200| 300|  10|
+----------+----+----+----+

I wish to pivot/transpose this dataframe as

+-----+----------+----------+
|col  |2021-05-01|2021-05-02|
+-----+----------+----------+
|Col1 |        20|       200|
|Col1 |        30|       300|
|Col1 |        40|        10|
+-----+----------+----------+

Other stackoverflow articles like this and this helped me to some extent but I have been able to reach to a solution.

My approaches were (all failed attempts)

scala> dUnion.groupBy("date").pivot("date").agg(first("col1")).show()
+----------+----------+----------+
|      date|2021-05-01|2021-05-02|
+----------+----------+----------+
|2021-05-02|      null|       200|
|2021-05-01|        20|      null|
+----------+----------+----------+

scala> dUnion.groupBy("date", "col1", "col2", "col3").pivot("date").agg(first("col1")).show()
+----------+----+----+----+----------+----------+
|      date|col1|col2|col3|2021-05-01|2021-05-02|
+----------+----+----+----+----------+----------+
|2021-05-02| 200| 300|  10|      null|       200|
|2021-05-01|  20|  30|  40|        20|      null|
+----------+----+----+----+----------+----------+

But the closet I could come up with was

scala> dUnion.groupBy().pivot("date").agg(first("col1")).show()
+----------+----------+
|2021-05-01|2021-05-02|
+----------+----------+
|        20|       200|
+----------+----------+
Nimantha
  • 6,405
  • 6
  • 28
  • 69
underwood
  • 845
  • 2
  • 11
  • 22

1 Answers1

2

This can be possible, but I think this is a bit slow.

val schema = df.schema
val longForm = df.flatMap(row => {
    val col = row.getString(0)
    (1 until row.size).map(i => {
        (col, schema(i).name, row.getString(i))
    })
})

longForm.groupBy('_2).pivot('_1).agg(first('_3))
.withColumnRenamed("_2", "col").show(10, false)


+----+----------+----------+
|col |2021-05-01|2021-05-02|
+----+----------+----------+
|col3|40        |10        |
|col1|20        |200       |
|col2|30        |300       |
+----+----------+----------+
Lamanus
  • 12,898
  • 4
  • 21
  • 47
  • Thank you @Lamanus. This solution worked for me. `stack` is another way of looking at this problem and it worked for me. – underwood Jun 16 '21 at 13:44