I have a spark dataframe :
+-----------------+------------+--------------------+------------------+------------------+
|opp_id__reference|oplin_status| stage| std_amount| std_line_amount|
+-----------------+------------+--------------------+------------------+------------------+
|OP-180618-7456377| Pending|7 - Deliver & Val...|31395.462999391966|13072.069816517043|
|OP-180618-7456377| Pending|7 - Deliver & Val...|31395.462999391966| 13.85958009943131|
+-----------------+------------+--------------------+------------------+------------------+
I would like to assign GREAT to oppt_line which std_line_amount >= 30% std_amount .
The expected output :
542 OP-180112-6925769 Pending 7 - Deliver & Validate 363802.836296 31261.159197 False
543 OP-180112-6925769 Pending 7 - Deliver & Validate 363802.836296 46832.656747 False
544 OP-180112-6925769 Pending 7 - Deliver & Validate 363802.836296 118542.329840 False
359 OP-180222-7065558 Pending 7 - Deliver & Validate 2.434888e+05 670.785793 False
389 OP-160712-5051474 Pending 7 - Deliver & Validate 1.288711e+05 1288.780000 False
770 OP-180720-7563258 Pending 7 - Deliver & Validate 1.366182e+05 13.859580 False
For this I did in pandas dataframe :
DF_BR6['greater']=DF_BR6.std_line_amount.gt(DF_BR6.groupby('opp_id__reference').std_amount.transform('sum')*0.3)
Can you help me to achieve it in spark dataframe please?
Thanks
Bests