3

enter image description here

As you can see on my picture, I have a column named probability and I want to create a new column from the probability column. I want to extract values from the probability column which is an array. But while trying to do so, I receive an error:

"Can't extract value from probability#52427: need struct type but got struct<type:tinyint,size:int,indices:array<int>,values:array<double>>"

Here is my extraction code:

preds_test = preds.withColumn("newCol", col("probability").getItem(3))

Can someone please tell me what I did wrong?

Aaron_ab
  • 3,450
  • 3
  • 28
  • 42
RoOt_Klem
  • 49
  • 1
  • 5
  • 3
    Does this answer your question? [How to split Vector into columns - using PySpark](https://stackoverflow.com/questions/38384347/how-to-split-vector-into-columns-using-pyspark) – user10938362 Mar 08 '20 at 12:27
  • while trying the first option I have this error **No module named 'pyspark.ml.functions'** – RoOt_Klem Mar 08 '20 at 12:51

1 Answers1

-5

I figured it out. I used a lambda function. This is my code:

preds_subset = preds.select('CustomerID','prediction', probs_churn('probability')).orderBy(asc("probability"))```
RoOt_Klem
  • 49
  • 1
  • 5