0

I have a spark dataframe :

+-----------------+------------+--------------------+------------------+------------------+
|opp_id__reference|oplin_status|               stage|        std_amount|   std_line_amount|
+-----------------+------------+--------------------+------------------+------------------+
|OP-180618-7456377|     Pending|7 - Deliver & Val...|31395.462999391966|13072.069816517043|
|OP-180618-7456377|     Pending|7 - Deliver & Val...|31395.462999391966| 13.85958009943131|
+-----------------+------------+--------------------+------------------+------------------+

I would like to assign GREAT to oppt_line which std_line_amount >= 30% std_amount .

The expected output :

542 OP-180112-6925769   Pending 7 - Deliver & Validate  363802.836296   31261.159197    False
543 OP-180112-6925769   Pending 7 - Deliver & Validate  363802.836296   46832.656747    False
544 OP-180112-6925769   Pending 7 - Deliver & Validate  363802.836296   118542.329840   False
359 OP-180222-7065558   Pending 7 - Deliver & Validate  2.434888e+05    670.785793  False
389 OP-160712-5051474   Pending 7 - Deliver & Validate  1.288711e+05    1288.780000 False
770 OP-180720-7563258   Pending 7 - Deliver & Validate  1.366182e+05    13.859580   False

For this I did in pandas dataframe :

DF_BR6['greater']=DF_BR6.std_line_amount.gt(DF_BR6.groupby('opp_id__reference').std_amount.transform('sum')*0.3)

Can you help me to achieve it in spark dataframe please?

Thanks

Bests

Poisson
  • 1,543
  • 6
  • 23
  • 34

0 Answers0