0

as follow

+----------------------------------------+
|probability                             |
+----------------------------------------+
|[0.42789998388333284,0.5721000161166672]|
|[0.42979424193820465,0.5702057580617953]|
|[0.4288468523208701,0.57115314767913]   |
+----------------------------------------+

the "probability" type is

org.apache.spark.sql.DataFrame = [probability: vector]

how split the probability into 2 column

thanks

xuguozheng
  • 11
  • 3

1 Answers1

0

you can do it like this using Dataset API :

import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.ml.linalg.Vector

df
  .as[Vector](ExpressionEncoder(): Encoder[Vector])
  .map(v => (v(0),v(1)))
  .toDF("prob1","prob2")
  .show()
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145