I am using Spark SQL 2.2.0 and DataFrame/DataSet API.
I need to explode several columns one per row.
I have:
+------+------+------+------+------+
|col1 |col2 |col3 |col4 |col5 |
+------+------+------+------+------+
|val11 |val21 |val31 |val41 |val51 |
|val12 |val22 |val32 |val42 |val52 |
+------+------+------+------+------+
And I need to have the following:
+------+------+---------+---------+
|col1 |col2 |col_num |col_new |
+------+------+---------+---------+
|val11 |val21 |col3 |val31 |
|val11 |val21 |col4 |val41 |
|val11 |val21 |col5 |val51 |
|val12 |val21 |col3 |val32 |
|val12 |val21 |col4 |val42 |
|val12 |val21 |col5 |val52 |
+------+------+---------+---------+
I managed to do join and explode like this:
val df2 = df.select(col("col1"), col("col2"), array(col("col3"), col("col4"), col("col5")) as "array")
val df3 = df2.withColumn("array", explode(col("array")))
This works but it does not add col_num
column (which I need). I tried to do it with flatMap
using custom map function but it fails.
Could you please help me how to do this?