I have a column in a dataframe that contains an array. I want to convert this array in to columns
>>> from pyspark.sql import Row
>>> from pyspark.mllib.linalg import DenseVector
>>> df = spark.createDataFrame([Row(a=1, intlist=DenseVector([1,2,3])), Row(a=2, intlist=DenseVector([4,5,6]))])
>>> df.show()
+---+-------------+
| a| intlist|
+---+-------------+
| 1|[1.0,2.0,3.0]|
| 2|[4.0,5.0,6.0]|
+---+-------------+
Expected output:
+---+---+---+---+
| a| _1| _2| _3|
+---+---+---+---+
| 1| 1| 2| 3|
| 2| 4| 5| 6|
+---+---+---+---+
The explode
function can do this but it adds rows instead of columns
>>> df.select(explode(df.intlist).alias("anInt")).show()
+-----+
|anInt|
+-----+
| 1|
| 2|
| 3|
| 4|
| 5|
| 6|
+-----+
Is there a way we can add columns instead of rows?