how to get multiple rows from one row in spark scala

Question

I have a dataframe in spark like below and I want to convert all the column in different rows with respect to first column id.

+----------------------------------+
| id code1 code2 code3 code4 code5 |
+----------------------------------+
| 1 A B C D E                      |
| 1 M N O P Q                      |
| 1 P Q R S T                      |
| 2 P A C D F                      |
| 2 S D F R G                      |
+----------------------------------+

I want the output like below format

+-------------+
|     id code |
+-------------+
|     1 A     |
|     1 B     |
|     1 C     |
|     1 D     |
|     1 E     |
|     1 M     |
|     1 N     |
|     1 O     |
|     1 P     |
|     1 Q     |
|     1 P     |
|     1 Q     |
|     1 R     |
|     1 S     |
|     1 T     |
|     2 P     |
|     2 A     |
|     2 C     |
|     2 D     |
|     2 F     |
|     2 S     |
|     2 D     |
|     2 F     |
|     2 R     |
|     2 G     |
+-------------+

Can anyone please help me here how I will get the above output with spark and scala.

No its not duplicate of above question which you mentioned – Divas Nikhra Aug 31 '17 at 11:08 — Divas Nikhra, Aug 31 '17 at 11:08
This question is not related to json format – Divas Nikhra Aug 31 '17 at 11:08 — Divas Nikhra, Aug 31 '17 at 11:08

Ramesh Maharjan · Answer 1 · 2017-08-31T11:55:42.117

2

using array, explode and drop functions should have you the desired output as

df.withColumn("code", explode(array("code1", "code2", "code3", "code4", "code5")))
    .drop("code1", "code2", "code3", "code4", "code5")

OR

as defined by undefined_variable, you can just use select

df.select($"id", explode(array("code1", "code2", "code3", "code4", "code5")).as("code"))

edited Aug 31 '17 at 11:55

answered Aug 31 '17 at 11:47

Ramesh Maharjan

41,071
6
69
97

Thanks @ramesh its working as expected for me.. This is something which I was looking for :) – Divas Nikhra Aug 31 '17 at 12:06
Thanks @DivasNikhra, please accept and upvote if the answer helped – Ramesh Maharjan Aug 31 '17 at 12:08

score 1 · Answer 2 · answered Aug 31 '17 at 11:36

1

df.select(col("id"),explode(concat_ws(",",Seq(col(code1),col("code2"),col("code3"),col("code4"),col("code5")))))

Basically idea is first concat all required columns and then explode it

answered Aug 31 '17 at 11:36

undefined_variable

6,180
2
22
37

how to get multiple rows from one row in spark scala

2 Answers2