0

I have a dataframe in spark which is like :

 column_A | column_B
 ---------  --------
  1          1,12,21
  2          6,9

both column_A and column_B is of String type.

how can I convert the above dataframe to a new dataframe which is like :

  colum_new_A | column_new_B
  -----------   ------------
     1             1
     1             12
     1             21
     2             6
     2             9

both column_new_A and column_new_B should be of String type.

himanshuIIITian
  • 5,985
  • 6
  • 50
  • 70
Dipanjan Das
  • 33
  • 1
  • 4

1 Answers1

1

You need to split the Column_B with comma and use the explode function as

val df = Seq(
  ("1", "1,12,21"),
  ("2", "6,9")
).toDF("column_A", "column_B")

You can use withColumn or select to create new column.

df.withColumn("column_B", explode(split( $"column_B", ","))).show(false)

df.select($"column_A".as("column_new_A"), explode(split( $"column_B", ",")).as("column_new_B"))

Output:

+------------+------------+
|column_new_A|column_new_B|
+------------+------------+
|1           |1           |
|1           |12          |
|1           |21          |
|2           |6           |
|2           |9           |
+------------+------------+
koiralo
  • 22,594
  • 6
  • 51
  • 72