1

Input Dataset:

CustomerID CustomerName Sun Mon Tue
   1          ABC        0   12  10
   2          DEF        10   0   0

Required Output Dataset:

CustomerID CustomerName Day  Value
    1         ABC       Sun   0
    1         ABC       Mon   12
    1         ABC       Tue   10
    2         DEF       Sun   10
    2         DEF       Mon   0
    2         DEF       Tue   0

Please note, that the number of "Sun Mon Tue" columns are 82 in my dataset!

1 Answers1

3

Assuming that your input dataset is generated using a case class as

case class infos(CustomerID: Int, CustomerName: String, Sun: Int, Mon: Int, Tue: Int)

For testing purpose I am creating a dataset as

import sqlContext.implicits._
val ds = Seq(
  infos(1, "ABC", 0, 12, 10),
  infos(2, "DEF", 10, 0, 0)
).toDS

which should give your input dataset

+----------+------------+---+---+---+
|CustomerID|CustomerName|Sun|Mon|Tue|
+----------+------------+---+---+---+
|1         |ABC         |0  |12 |10 |
|2         |DEF         |10 |0  |0  |
+----------+------------+---+---+---+

Getting your final required dataset requires you to create another case class as

case class finalInfos(CustomerID: Int, CustomerName: String, Day: String, Value: Int)

Final required dataset can be achieved by doing the following

val names = ds.schema.fieldNames

ds.flatMap(row => Array(finalInfos(row.CustomerID, row.CustomerName, names(2), row.Sun),
  finalInfos(row.CustomerID, row.CustomerName, names(3), row.Mon),
  finalInfos(row.CustomerID, row.CustomerName, names(4), row.Tue)))

which should give you dataset as

+----------+------------+---+-----+
|CustomerID|CustomerName|Day|Value|
+----------+------------+---+-----+
|1         |ABC         |Sun|0    |
|1         |ABC         |Mon|12   |
|1         |ABC         |Tue|10   |
|2         |DEF         |Sun|10   |
|2         |DEF         |Mon|0    |
|2         |DEF         |Tue|0    |
+----------+------------+---+-----+
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97