I have PySpark dataframe below:
cust | amount |
----------------
A | 5 |
A | 1 |
A | 3 |
B | 4 |
B | 4 |
B | 2 |
C | 2 |
C | 1 |
C | 7 |
C | 5 |
I need to group by column 'cust'
and calculates the average per group.
Expected result:
cust | avg_amount
-------------------
A | 3
B | 3.333
C | 7.5
I've been using the code as below but giving me the error.
data.withColumn("avg_amount", F.avg("amount"))
Any idea how I can make this average?