I trained a xgb classifier model in pyspark and transformed some data via
outp = model.transform(inp)
now outp contains a column 'probability' with row entries such as
Row(probability=DenseVector([0.99,0.01]))
I'd like to add a new column to outp, that contains rows of floats from the second probability component of the Row elements mentioned above (so e.g. just 0.01 instead of Row(...) ). What is the correct syntax to do that?
I tried
outp = outp.select("*",(col('probability')[:,1]).alias('prob'))
expecting that the first element of each row in the column will be selected. But that syntax produces an error.