0

How to create new rows in dataset based on multiple values present in array in one column of the dataset:

I have a dataset with following data:

+----+---------+-------------------+------------------+
|name|productId|              total|            scores|
+----+---------+-------------------+------------------+
| aaa|      200|               0.29|            [0.29]|
| bbb|      200| 1.3900000000000001|      [0.53, 0.33]|
| aaa|      100|0.22999999999999998|      [0.12, 0.11]|
+----+---------+-------------------+------------------+

I want to transform this into below format in scala:

+----+---------+-------------------+------------------+
|name|productId|              total|            scores|
+----+---------+-------------------+------------------+
| aaa|      200|               0.29|            0.29  |
| bbb|      200| 1.3900000000000001|            0.53  |
| bbb|      200| 1.3900000000000001|            0.33  |
| aaa|      100|0.22999999999999998|            0.12  |
| aaa|      100|0.22999999999999998|            0.11  |
+----+---------+-------------------+------------------+
Sprasad
  • 25
  • 7

1 Answers1

0

That's exactly what the explode function is meant for:

df.withColumn("score", explode('scores))
Oli
  • 9,766
  • 5
  • 25
  • 46