I have the result from a pyspark.ml.classificaton.LogisticRegression
model. One of the columns is probability
which contains the following vector:
I want to extract the value with the indices 3 and 1 and have written the following crude UDF but it keeps returning an index error even on valid records.
def getPurchProb(o):
try:
return o[3][1]
except ValueError:
return -1.0
except IndexError:
return -2.0
udfPurchProb = udf(getPurchProb, DoubleType())
result.select(udfPurchProb("probability"))
What do I do wrong?
Update
This issue was flagged as a duplicate of How to access element of a VectorUDT column in a Spark DataFrame?. I tried that approach before the one above hand had identical errors. My issue is that I experience and IndexError when trying to extract the value, not a ValueError on conversion.