0

I have the result from a pyspark.ml.classificaton.LogisticRegression model. One of the columns is probability which contains the following vector:

Probabilities column

I want to extract the value with the indices 3 and 1 and have written the following crude UDF but it keeps returning an index error even on valid records.

def getPurchProb(o):
  try: 
    return o[3][1]
  except ValueError:
    return -1.0
  except IndexError:
    return -2.0


udfPurchProb = udf(getPurchProb, DoubleType())
result.select(udfPurchProb("probability"))

What do I do wrong?

Update

This issue was flagged as a duplicate of How to access element of a VectorUDT column in a Spark DataFrame?. I tried that approach before the one above hand had identical errors. My issue is that I experience and IndexError when trying to extract the value, not a ValueError on conversion.

Hans
  • 2,800
  • 3
  • 28
  • 40
  • No, this is not a duplicate, as I actually tried that method first and with the same result. – Hans Jun 29 '17 at 15:14
  • It is a duplicate (or non-existing problem). You are existing a non-existent entry. I have not sure how you came up with `o[3][1]` (misinterpreted Databricks UI and off-by-one?) but what you want is `o[0]` or `o[1]`. There is no place for nesting here. – zero323 Jun 29 '17 at 20:41

0 Answers0