0

I am using Spark SQL 2.2.0 and DataFrame/DataSet API.

I need to explode several columns one per row.

I have:

+------+------+------+------+------+
|col1  |col2  |col3  |col4  |col5  |
+------+------+------+------+------+
|val11 |val21 |val31 |val41 |val51 |
|val12 |val22 |val32 |val42 |val52 |
+------+------+------+------+------+

And I need to have the following:

+------+------+---------+---------+
|col1  |col2  |col_num  |col_new  |
+------+------+---------+---------+
|val11 |val21 |col3     |val31    |
|val11 |val21 |col4     |val41    |
|val11 |val21 |col5     |val51    |
|val12 |val21 |col3     |val32    |
|val12 |val21 |col4     |val42    |
|val12 |val21 |col5     |val52    |
+------+------+---------+---------+

I managed to do join and explode like this:

val df2 = df.select(col("col1"), col("col2"), array(col("col3"), col("col4"), col("col5")) as "array")
val df3 = df2.withColumn("array", explode(col("array")))

This works but it does not add col_num column (which I need). I tried to do it with flatMap using custom map function but it fails.

Could you please help me how to do this?

philantrovert
  • 9,904
  • 3
  • 37
  • 61
alex-arkhipov
  • 72
  • 1
  • 7
  • Also this : https://stackoverflow.com/questions/42465568/unpivot-in-spark-sql-pyspark – philantrovert May 22 '18 at 12:07
  • First answer does not seem to be relevant. However the second one (https://stackoverflow.com/questions/42465568/unpivot-in-spark-sql-pyspark) is the silver bullet for me!!! Thank you so much! – alex-arkhipov May 22 '18 at 13:42

0 Answers0