Below is an example of csv file I'm working on:
life id,policy id,benefit id,date of commencment,status
xx_0,0,0,11/11/2017,active
xx_0,0,0,12/12/2017,active
axb_0,1,0,10/01/2015,active
axb_0,1,0,11/10/2014,active
fxa_2,0,1,01/02/203,active
What I want to do is to groupby (lifeid
+ policyid
+ benefitid
) the data and sort by the date and then take the recent (last) element of each group to do some controls to it.
What's the best way to do this on spark?