0

I'm new to Spark and I'm struggling to convert a column of Array[Double] to n columns. For example, I want to convert this

+------------------+
|            vector|
+------------------+
| [19.224, 46.9505]|
+------------------+

to this:

+-----------------+------------------+
|          vector1|           vector2|
+-----------------+------------------+
|           19.224|           46.9505|
+-----------------+------------------+

Possibly applying the original schema that I've stored before. I've tried the explode sql utility but it spreads all the values on a single column like this

+------------------+
|           vector3|
+------------------+
|            19.224|
|           46.9505|
+------------------+

Is there a way to do this without using udfs?

Any help will be appreciated

ggagliano
  • 1,004
  • 1
  • 11
  • 27
  • Or, if it is a `VectorUDT` - [Spark Scala: How to convert Dataframe vector to DataFramef1:Double, …, fn: Double)](https://stackoverflow.com/a/38110524/8371915) – Alper t. Turker Jun 05 '18 at 10:29
  • 2
    Possible duplicate of [Convert Array of String column to multiple columns in spark scala](https://stackoverflow.com/questions/50362265/convert-array-of-string-column-to-multiple-columns-in-spark-scala) and https://stackoverflow.com/questions/49911608/scala-spark-split-vector-column-into-separate-columns-in-a-spark-dataframe – Ramesh Maharjan Jun 05 '18 at 10:49

1 Answers1

1

you can do it like this:

df
  .select(
    $"vector"(0).as("vector1"),
    $"vector"(1).as("vector2")
  )
  .show()

+-------+-------+
|vector1|vector2|
+-------+-------+
| 19.224|46.9505|
+-------+-------+

Or more generic :

val N = 2

val selectExpr = (0 until N).map(i=> $"vector"(i).as(s"vector${i+1}"))

df
  .select(selectExpr:_*)
  .show()
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145