0

Suppose I have a column of arrays like this:

column_x
[1,5,[],[2,3,22,42,3,-5]]
[1,5,[],[-3,67,32,2,2.14,5]]
[1,5,[],[32,1,3,34,6.7,90]]

I want to extract the fourth element of the array in each row, and separate these elements into different columns like this:

column1 column2 column3 column4 column5 column6
2       3       22      42      3       -5
-3      67      32      2       2.14     5
32      1       3       34      6.7      90

I tried using the getItem() function but it's not working. I'm not entirely sure if I'm using it correctly.

Outlier
  • 417
  • 2
  • 10

1 Answers1

0

Since Spark 3.0.0 you can use vector_to_array

https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.functions.vector_to_array.html

Since you have an array nested in, you might have to use it twice.

Gaarv
  • 814
  • 8
  • 15