1

I'm trying to populate "FinalDate" based on "ExpectedDate" and "ObservedDate".

The rules are: for each group, if observed date is greater than previous expected date and less than the next expected date then final date is equal to observed date, otherwise final date is equal to expected date.

How can I modify the code below to make sure that:

  • FinalDate is filled in by Group
  • Iteration numbers don't skip any rows
set.seed(2)
dat<-data.frame(Group=sample(LETTERS[1:10], 100, replace=TRUE), 
                Date=sample(seq(as.Date('2013/01/01'), as.Date('2020/01/01'), by="day"), 100))%>%
  mutate(ExpectedDate=Date+sample(10:200, 100, replace=TRUE),
         ObservedDate=Date+sample(10:200, 100, replace=TRUE))%>%
           group_by(Group)%>%
           arrange(Date)%>%
           mutate(n=row_number())%>%arrange(Group)%>%ungroup()%>%
  as.data.frame()

#generate some missing values in "ObservedDate"
dat[sample(nrow(dat),20), "ObservedDate"]<-NA

dat$FinalDate<-NA
for (i in 1:nrow(dat)){
  dat[i, "FinalDate"]<-if_else(!is.na(dat$"ObservedDate")[i] && 
                                 dat[i, "ObservedDate"] > dat[i-1, "ExpectedDate"] && 
                                 dat[i, "ObservedDate"] < dat[i+1, "ExpectedDate"],
                               dat[i, "ObservedDate"],
                               dat[i,"ExpectedDate"])
}
dat$FinalDate<-as.Date(dat$FinalDate) # convert numeric to Date format

e.g. in output below:

at i=90, the code looks for previous ExpectedDate within letter I

we want it to look for ExpectedDate only within letter J. If there is no previous expected date for a group and ObservedDate is greater than ExpectedDate but less than the next ExpectedDate then FinalDate should be filled with ExpectedDate.

at i=100, the code generates NA because there is no next observation available

we want this value to be filled in such that for last observation in each group, FinalDate=ObservedDate if ObservedDate is greater than this last ExpectedDate within group, else ExpectedDate.

    Group       Date ExpectedDate ObservedDate  n  FinalDate
88      I 2015-09-07   2015-12-05         <NA>  7 2015-12-05
89      I 2018-08-02   2018-11-01   2018-08-13  8 2018-11-01
90      J 2013-07-24   2013-08-30   2013-08-12  1 2013-08-30
91      J 2013-11-22   2014-01-02   2014-04-05  2 2014-04-05
92      J 2014-11-03   2015-03-23   2015-05-10  3 2015-05-10
93      J 2015-08-30   2015-12-09   2016-02-04  4 2016-02-04
94      J 2016-04-18   2016-09-03         <NA>  5 2016-09-03
95      J 2016-10-10   2017-01-29   2017-04-14  6 2017-04-14
96      J 2017-02-14   2017-07-05         <NA>  7 2017-07-05
97      J 2017-04-21   2017-10-01   2017-08-26  8 2017-08-26
98      J 2017-10-01   2018-01-27   2018-02-28  9 2018-02-28
99      J 2018-08-03   2019-01-31   2018-10-20 10 2018-10-20
100     J 2019-04-25   2019-06-23   2019-08-16 11       <NA>
Mahim
  • 11
  • 3
  • 1
    You may wanna read this as well: https://stackoverflow.com/questions/6558921/boolean-operators-and – M-- Jan 19 '20 at 06:30

1 Answers1

0

We can let go off for loop and use group_by, lag and lead here from dplyr :

library(dplyr)

dat %>%
  group_by(Group) %>%
  mutate(FinalDate = if_else(ObservedDate > lag(ExpectedDate) & 
                    ObservedDate < lead(ExpectedDate), ObservedDate, ExpectedDate)) 

We can also do this data.table::between

dat %>%
  group_by(Group) %>%
  mutate(FinalDate = if_else(data.table::between(ObservedDate, 
           lag(ExpectedDate), lead(ExpectedDate)), ObservedDate, ExpectedDate)) 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    ```Error in data.table::between(ObservedDate, lag(ExpectedDate), lead(ExpectedDate)) : Not yet implemented NAbounds=TRUE for this non-numeric and non-character type``` you cannot use `data.table::between` here. – M-- Jan 19 '20 at 06:42
  • @M-- works fine for me for the data generated by OP and gives the same output as the `dplyr` one. – Ronak Shah Jan 19 '20 at 07:03