Reshaping Data with Pivot in Spark gives support for reshaping with pivot
. I understood melt
is roughly the reverse of pivot also called unpivot
. I'm relatively new to Spark
. With my knowledge i tried to implement melt operation.
def melt(df: DataFrame, columns: List[String]): DataFrame ={
val restOfTheColumns = df.columns.filterNot(columns.contains(_))
val baseDF = df.select(columns.head, columns.tail: _*)
val newStructure =StructType(baseDF.schema.fields ++ List(StructField("variable", StringType, true), StructField("value", StringType, true)))
var newdf = sqlContext.createDataFrame(sqlContext.sparkContext.emptyRDD[Row], newStructure)
for(variableCol <- restOfTheColumns){
val colValues = df.select(variableCol).map(r=> r(0).toString)
val colRdd=baseDF.rdd.zip(colValues).map(tuple => Row.fromSeq(tuple._1.toSeq.:+(variableCol).:+(tuple._2.toString)))
var colDF =sqlContext.createDataFrame(colRdd, newStructure)
newdf =newdf.unionAll(colDF)
}
newdf
}
It does the work. But i am not very sure about the efficiency.
+-----+---+---+----------+------+
| name|sex|age| street|weight|
+-----+---+---+----------+------+
|Alice| f| 34| somewhere| 70|
| Bob| m| 63| nowhere| -70|
|Alice| f|612|nextstreet| 23|
| Bob| m|612| moon| 8|
+-----+---+---+----------+------+
Can be used as
melt(df, List("name", "sex"))
The result is as below:
+-----+---+--------+----------+
| name|sex|variable| value|
+-----+---+--------+----------+
|Alice| f| age| 34|
| Bob| m| age| 63|
|Alice| f| age| 612|
| Bob| m| age| 612|
|Alice| f| street| somewhere|
| Bob| m| street| nowhere|
|Alice| f| street|nextstreet|
| Bob| m| street| moon|
|Alice| f| weight| 70|
| Bob| m| weight| -70|
|Alice| f| weight| 23|
| Bob| m| weight| 8|
+-----+---+--------+----------+
I hope it is useful and appreciate your comments if there is room for improvements.