> df <- data.frame('unique_ref' = c("a_2016","a_2017","a_2017","a_2016","a_2016"),
+ 'trans_type' = c('NB','NB','CANC','MTA','MTA'),
+ 'incept_dt' = c('01/01/2016','01/01/2017','01/01/2017','01/01/2016','01/01/2016'),
+ 'exp_dt' = c('31/12/2016','31/12/2017','31/12/2017','31/12/2016','31/12/2016'),
+ 'trans_dt' = c('01/01/2016','01/01/2017','01/03/2017','01/07/2016','01/09/2016'))
> df
unique_ref trans_type incept_dt exp_dt trans_dt
1 a_2016 NB 01/01/2016 31/12/2016 01/01/2016
2 a_2017 NB 01/01/2017 31/12/2017 01/01/2017
3 a_2017 CANC 01/01/2017 31/12/2017 01/03/2017
4 a_2016 MTA 01/01/2016 31/12/2016 01/07/2016
5 a_2016 MTA 01/01/2016 31/12/2016 01/09/2016
I have the above dataset format which has a unique_ref and a few dates. I want to be able to sort this dataset by the unique ref and incept_dt and trans_dt:
> df %>% arrange(unique_ref,incept_dt,trans_dt)
unique_ref trans_type incept_dt exp_dt trans_dt
1 a_2016 NB 01/01/2016 31/12/2016 01/01/2016
2 a_2016 MTA 01/01/2016 31/12/2016 01/07/2016
3 a_2016 MTA 01/01/2016 31/12/2016 01/09/2016
4 a_2017 NB 01/01/2017 31/12/2017 01/01/2017
5 a_2017 CANC 01/01/2017 31/12/2017 01/03/2017
Now with this sorted dataset I want to create a new column called trans_end_dt which looks at the row below and picks up that rows trans_dt less 1 day. It should do this for every unique_ref, but stop once it has reached the final unique_ref in that grouping and pick up the exp_dt. I.e. the result should be:
> df %>% arrange(unique_ref,incept_dt,trans_dt)
unique_ref trans_type incept_dt exp_dt trans_dt trans_end_dt
1 a_2016 NB 01/01/2016 31/12/2016 01/01/2016 30/06/2016 #this is 01/07/2016 minus one day
2 a_2016 MTA 01/01/2016 31/12/2016 01/07/2016 31/08/2016 #same logic as above
3 a_2016 MTA 01/01/2016 31/12/2016 01/09/2016 31/12/2016 #next row is a new unique_ref so the value should just be the exp_dt which is 31/12/2016
4 a_2017 NB 01/01/2017 31/12/2017 01/01/2017 28/02/2017
5 a_2017 CANC 01/01/2017 31/12/2017 01/03/2017 31/12/2017
Does anyone know how I can do this? Preferably using dplyr but I am struggling to get this to work so any solution would be great