I have the following Spark dataframe :
agent_id|payment_amount|
+--------+--------------+
| a| 1000|
| b| 1100|
| a| 1100|
| a| 1200|
| b| 1200|
| b| 1250|
| a| 10000|
| b| 9000|
+--------+--------------+
my desire output would be something like
agen_id 95_quantile
a whatever is 95 quantile for agent a payments
b whatever is 95 quantile for agent b payments
for each group of agent_id I need to calculate the 0.95 quantile, I take the following approach:
test_df.groupby('agent_id').approxQuantile('payment_amount',0.95)
but I take the following error:
'GroupedData' object has no attribute 'approxQuantile'
I need to have .95 quantile(percentile) in a new column so later can be used for filtering purposes
I am using Spark 2.0.0