I have data that looks similar to the following except with hundreds of IDs and thousands of observations:
ID date measles
1 2008-09-12 1
1 2008-10-25 NA
1 2009-01-12 1
1 2009-03-12 NA
1 2009-05-12 1
2 2010-05-17 NA
2 2010-06-12 NA
2 2010-07-02 1
2 2010-08-13 NA
I want to create a variable that will store the previous date for each pid like the following:
ID date measles previous_date
1 2008-09-12 1 NA
1 2008-10-25 NA 2008-09-12
1 2009-01-12 1 2008-10-25
1 2009-03-12 NA 2009-01-12
1 2009-05-12 1 2009-03-12
2 2010-05-17 NA NA
2 2010-06-12 NA 2010-05-17
2 2010-07-02 1 2010-06-12
2 2010-08-13 NA 2010-07-02
This should be an extremely easy task, but I have been unsuccessful at getting a lag variable to work properly. I have tried a few methods, such as the following:
dt[, previous_date:=c(NA, current_date[-.N]), by=c("ID")]
dt[,previous_date:=current_date-shift(current_date,1,type="lag"),by=ID]
The code samples above either produce sporadic numbers in the previous_date variable or produce all NAs. I'm not sure why this is? Is it because I'm using a date variable as opposed to an integer?
Is there a better way to accomplish this task that would work for a date variable?