1

I have a dataframe in spark like below and I want to convert all the column in different rows with respect to first column id.

+----------------------------------+
| id code1 code2 code3 code4 code5 |
+----------------------------------+
| 1 A B C D E                      |
| 1 M N O P Q                      |
| 1 P Q R S T                      |
| 2 P A C D F                      |
| 2 S D F R G                      |
+----------------------------------+

I want the output like below format

+-------------+
|     id code |
+-------------+
|     1 A     |
|     1 B     |
|     1 C     |
|     1 D     |
|     1 E     |
|     1 M     |
|     1 N     |
|     1 O     |
|     1 P     |
|     1 Q     |
|     1 P     |
|     1 Q     |
|     1 R     |
|     1 S     |
|     1 T     |
|     2 P     |
|     2 A     |
|     2 C     |
|     2 D     |
|     2 F     |
|     2 S     |
|     2 D     |
|     2 F     |
|     2 R     |
|     2 G     |
+-------------+

Can anyone please help me here how I will get the above output with spark and scala.

zero323
  • 322,348
  • 103
  • 959
  • 935
Divas Nikhra
  • 91
  • 2
  • 12

2 Answers2

2

using array, explode and drop functions should have you the desired output as

df.withColumn("code", explode(array("code1", "code2", "code3", "code4", "code5")))
    .drop("code1", "code2", "code3", "code4", "code5")

OR

as defined by undefined_variable, you can just use select

df.select($"id", explode(array("code1", "code2", "code3", "code4", "code5")).as("code"))
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
1
df.select(col("id"),explode(concat_ws(",",Seq(col(code1),col("code2"),col("code3"),col("code4"),col("code5")))))

Basically idea is first concat all required columns and then explode it

undefined_variable
  • 6,180
  • 2
  • 22
  • 37