I have a data frame and I would like to use Scala to explode rows into multiple rows using the values in multiple columns. Ideally I am looking to replicate the behavior of the R function melt()
.
All the columns contain Strings
.
Example: I want to transform this data frame..
df.show
+--------+-----------+-------------+-----+----+
|col1 | col2 | col3 | res1|res2|
+--------+-----------+-------------+-----+----+
| a| baseline| equivalence| TRUE| 0.1|
| a| experiment1| equivalence|FALSE|0.01|
| b| baseline| equivalence| TRUE| 0.2|
| b| experiment1| equivalence|FALSE|0.02|
+--------+-----------+-------------+-----+----+
...Into this data frame:
+--------+-----------+-------------+-----+-------+
|col1 | col2 | col3 | key |value|
+--------+-----------+-------------+-----+-------+
| a| baseline| equivalence| res1 | TRUE |
| a|experiment1| equivalence| res1 | FALSE|
| b| baseline| equivalence| res1 | TRUE |
| b|experiment1| equivalence| res1 | FALSE|
| a| baseline| equivalence| res2 | 0.1 |
| a|experiment1| equivalence| res2 | 0.01 |
| b| baseline| equivalence| res2 | 0.2 |
| b|experiment1| equivalence| res2 | 0.02 |
+--------+-----------+-------------+-----+-------+
- Is there a built-in function in Scala which applies to datasets or data frames to do this?
- If not, would it be relatively simple to implement this? How would it be done at a high level?
Note: I have found the class UnpivotOp
from SMV which would do exactly what I want: (https://github.com/TresAmigosSD/SMV/blob/master/src/main/scala/org/tresamigos/smv/UnpivotOp.scala).
Unfortunately, the class is private, so I cannot do something like this:
import org.tresamigos.smv.UnpivotOp
val melter = new UnpivotOp(df, Seq("res1","res2"))
val melted_df = melter.unpivot()
Does anyone know if there a way to access the class org.tresamigos.smv.UnpivotOp
via some some other class of static method of SMV?
Thanks!