I have seen a few solutions to unpivot
a spark dataframe when the number of columns is reasonably low and that the columns' names can be hardcoded. Do you have a scalable solution to unpivot a dataframe with numerous columns?
Below is a toy problem.
Input:
val df = Seq(
(1,1,1,0),
(2,0,0,1)
).toDF("ID","A","B","C")
+---+--------+----+
| ID| A | B | C |
+---+--------+-----
| 1| 1 | 1 | 0 |
| 2| 0 | 0 | 1 |
+---+----------+--+
expected result:
+---+-----+-----+
| ID|names|count|
+---+-----------|
| 1| A | 1 |
| 1| B | 1 |
| 1| C | 0 |
| 2| A | 0 |
| 2| B | 0 |
| 2| C | 1 |
+---+-----------+
The solution should be applicable to datasets with N columns to unpivot, where N is large (say 100 columns).