0

I have a column in a dataframe that contains an array. I want to convert this array in to columns

>>> from pyspark.sql import Row
>>> from pyspark.mllib.linalg import DenseVector
>>> df = spark.createDataFrame([Row(a=1, intlist=DenseVector([1,2,3])), Row(a=2, intlist=DenseVector([4,5,6]))])
>>> df.show()
+---+-------------+
|  a|      intlist|
+---+-------------+
|  1|[1.0,2.0,3.0]|
|  2|[4.0,5.0,6.0]|
+---+-------------+

Expected output:

+---+---+---+---+
| a| _1| _2|  _3|
+---+---+---+---+
|  1|  1|  2|  3|
|  2|  4|  5|  6|
+---+---+---+---+

The explode function can do this but it adds rows instead of columns

>>> df.select(explode(df.intlist).alias("anInt")).show()
+-----+
|anInt|
+-----+
|    1|
|    2|
|    3|
|    4|
|    5|
|    6|
+-----+

Is there a way we can add columns instead of rows?

Clock Slave
  • 7,627
  • 15
  • 68
  • 109

0 Answers0