I have a set of dates and times for several individuals (ID
) that correspond to our primary outcome measure (Y
) and a covariate (X1
).
My objective is to replace missing X1
values for each of the Y
rows if the X1
measurement was recorded within a +/- 24 hour period from the date/time that the Y
variable was measured. To make this easier to visualize (and load into R), here is how the data are currently arranged:
structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L), TIME = structure(1:15, .Label = c("01/01/2013 12:01",
"01/03/2013 08:49", "01/03/2013 20:52", "02/01/2013 05:00", "02/03/2013 05:30",
"02/03/2013 21:14", "02/05/2013 05:15", "02/12/2013 05:03", "02/15/2013 04:16",
"02/16/2013 04:12", "02/16/2013 21:02", "03/01/2010 17:58", "03/02/2010 00:10",
"03/03/2010 10:45", "03/04/2010 09:00"), class = "factor"), Y = structure(c(1L,
5L, 7L, 1L, 1L, 2L, 1L, 1L, 1L, 4L, 3L, 1L, 8L, 1L, 6L), .Label = c(".",
"22", "35", "4", "5", "6", "8", "9"), class = "factor"), X1 = structure(c(2L,
1L, 1L, 7L, 7L, 1L, 4L, 4L, 3L, 1L, 1L, 6L, 1L, 5L, 1L), .Label = c(".",
"0.1", "0.2", "0.4", "0.6", "0.9", "1.0"), class = "factor")), .Names = c("ID",
"TIME", "Y", "X1"), class = "data.frame", row.names = c(NA, -15L))
To simplify the desired output, I would like to only display the rows with non-missing Y
values, such that the end product would look like this:
ID TIME Y X1
1 1 01/03/2013 08:49 5 .
2 1 01/03/2013 20:52 8 .
3 2 02/03/2013 21:14 22 .
4 2 02/16/2013 04:12 4 0.2
5 2 02/16/2013 21:02 35 .
6 3 03/02/2010 00:10 9 0.9
7 3 03/04/2010 09:00 6 0.6
Is it possible to (1) iterate across multiple rows and evaluate the absolute value of 24 hours to get the difference between the X1
and Y
measurements and (2) to replace the missing values of X1
with those that are within the +/- 24 hour window?
Any thoughts on how to go about this would be greatly appreciated!