I am new to Spark. Need help on implementing the logic in Spark using dataframe. Assume that I have one dataframe df1 with the following data.
DF1 :
txn-id,productid,desc
1,'AA','ADESC'
2,'BB','BDESC'
3,'CC','CDESC'
4,'BB','ZDESC'
5,'CC','YDESC'
I want the desired output in the below format using dataframe(without use of spark sql).Basically want to do group by on productid and want to select the max of transaction id and desc of that transaction id.
Result:
txn-id,productid,desc
1,'AA','ADESC'
4,'BB','ZDESC'
5,'CC','YDESC'
Can you please help me with the logic.
Thanks, Sumit