0

I have the dataframe as follows :

ColA  ColB     ColC  
1     [2,3,4] [5,6,7]

I need to convert it to the below

ColA ColB ColC  
1    2    5  
1    3    6  
1    4    7  

Can someone please help with the Code in SCALA?

3 Answers3

2

You can zip the two array columns by means of a UDF and explode the zipped column as follows:

val df = Seq(
  (1, Seq(2, 3, 4), Seq(5, 6, 7))
).toDF("ColA", "ColB", "ColC")

def zip = udf(
  (x: Seq[Int], y: Seq[Int]) => x zip y 
)

val df2 = df.select($"ColA", zip($"ColB", $"ColC").as("BzipC")).
  withColumn("BzipC", explode($"BzipC"))

val df3 = df2.select($"ColA", $"BzipC._1".as("ColB"), $"BzipC._2".as("ColC"))

df3.show
+----+----+----+
|ColA|ColB|ColC|
+----+----+----+
|   1|   2|   5|
|   1|   3|   6|
|   1|   4|   7|
+----+----+----+
Leo C
  • 22,006
  • 3
  • 26
  • 39
0

The idea I am presenting here is a bit complex which requires you to use map to combine the two arrays of ColB and ColC. Then use the explode function to explode the combined array. and finally extract the exploded combined array to different columns.

import org.apache.spark.sql.functions._
val tempDF = df.map(row => {
  val colB = row(1).asInstanceOf[mutable.WrappedArray[Int]]
  val colC = row(2).asInstanceOf[mutable.WrappedArray[Int]]
  var array = Array.empty[(Int, Int)]
  for(loop <- 0 to colB.size-1){
    array = array :+ (colB(loop), colC(loop))
  }
  (row(0).asInstanceOf[Int], array)
})
  .toDF("ColA", "ColB")
  .withColumn("ColD", explode($"ColB"))

tempDF.withColumn("ColB", $"ColD._1").withColumn("ColC", $"ColD._2").drop("ColD").show(false)

this would give you result as

+----+----+----+
|ColA|ColB|ColC|
+----+----+----+
|1   |2   |5   |
|1   |3   |6   |
|1   |4   |7   |
+----+----+----+
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
0

You can also use a combination of posexplode and lateral view from HiveQL

sqlContext.sql(""" 
select 1 as colA, array(2,3,4) as colB, array(5,6,7) as colC 
""").registerTempTable("test")

sqlContext.sql(""" 
select 
    colA , b as colB, c as colC 
from 
    test 
lateral view 
    posexplode(colB) columnB as seqB, b 
lateral view 
    posexplode(colC) columnC as seqC, c 
where 
    seqB = seqC 
""" ).show

+----+----+----+
|colA|colB|colC|
+----+----+----+
|   1|   2|   5|
|   1|   3|   6|
|   1|   4|   7|
+----+----+----+

Credits: https://stackoverflow.com/a/40614822/7224597 ;)

philantrovert
  • 9,904
  • 3
  • 37
  • 61