0

How can i to get the first element from the probability model results in a pyspark dataframe?

+------+--------------------+
|labelh|         probability|
+------+--------------------+
|     1|[0.72498853530094...|
|     1|[0.99989771872286...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|
|     1|[0.72498853530094...|    
+------+--------------------+

The probability column has two elements and is type "vector" variable. I need the first element from this column and paste it in the dataframe.

Thanks for help.

Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
Juan David
  • 361
  • 1
  • 4
  • 15
  • It works, thank you!!. Question: why do you need to define a udf to make this operation?, I've seen this function, but i dont know how this works!!! – Juan David Jun 01 '18 at 12:51
  • `Vector` is implemented as an `UserDefinedType` and there is just no function which can operate on that. There was an idea of [VectorDisasembler](https://stackoverflow.com/a/41639236/8371915) but it has been rejected for now. – Alper t. Turker Jun 01 '18 at 13:09

0 Answers0