I have the following DataFrame df
:
customer_id product_id timestamp action
111 1 1519030817 add
111 1 1519030917 remove
111 2 1519030819 add
222 2 1519030819 add
I want to group records by customer_id
and product_id
, and take the last action.
This is what I did:
df.groupBy("customer_id","product_id").orderBy(desc("timestamp"))
But how can I actually take the latest action?
The result should be the following:
customer_id product_id timestamp action
111 1 1519030917 remove
111 2 1519030819 add
222 2 1519030819 add