I applied a Logistic Regression in Spark to my data and wants now to split the "probability" column (which is a DenseVector of 2 values each time).
It was a binary classification, so the 'probability' is made of couple like :
>>> predictions.head(1)
[Row(scaledFeatures=SparseVector(10, {4: 4.398, 6: 2.2351}), rawPrediction=DenseVector([4.4453, -4.4453]), probability=DenseVector([0.9829, 0.0171]), prediction=0.0)]
How can I get a new column with only the second value of 'probability' (here, 0.0171) ?
I tried
predictions.withColumn('prob', predictions['probability'][1])
but got an error :
Traceback (most recent call last): File "", line 1, in File "/usr/hdp/current/spark-client/python/pyspark/sql/dataframe.py", line 1314, in withColumn return DataFrame(self._jdf.withColumn(colName, col._jc), self.sql_ctx) File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in call File "/usr/hdp/current/spark-client/python/pyspark/sql/utils.py", line 51, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: u"Can't extract value from probability#827;"