I am trying to create a program that selects for the closest day within a 30-day range of up to 900 days(1-30,31-60,61-90......871-900). I am using R version 3.3.3.
Here is an example of the dataset I have:
have <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L,
5L, 5L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L), time.to.first = c(0L, 78L, 293L, 0L,
63L, 0L, 89L, 0L, 11L, 27L, 0L, 28L, 0L, 29L, 0L, 31L, 381L,
778L, 0L, 28L, 69L, 96L, 466L, 0L, 28L, 56L, 98L, 154L, 220L,
294L, 395L, 507L), visit = c(1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L,
2L, 3L, 1L, 2L, 1L, 2L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)), .Names = c("id", "time.to.first",
"visit"), row.names = c(NA, 32L), class = "data.frame")
Here is what I would like:
want <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L,
5L, 5L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L), time.to.first = c(0L, 78L, 293L, 0L,
63L, 0L, 89L, 0L, 11L, 27L, 0L, 28L, 0L, 29L, 0L, 31L, 381L,
778L, 0L, 28L, 69L, 96L, 466L, 0L, 28L, 56L, 98L, 154L, 220L,
294L, 395L, 507L), visit = c(1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L,
2L, 3L, 1L, 2L, 1L, 2L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), time.window = structure(c(1L,
11L, 5L, 1L, 11L, 1L, 11L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 6L,
7L, 12L, 1L, 2L, 11L, 13L, 9L, 1L, 2L, 6L, 13L, 3L, 4L, 5L, 8L,
10L), .Label = c("", "1-30", "151-180", "211-240", "271-300",
"31-60", "361-390", "391-420", "451-480", "481-510", "61-90",
"751-780", "91-120"), class = "factor")), .Names = c("id", "time.to.first",
"visit", "time.window"), row.names = c(NA, 32L), class = "data.frame")
I was able to figure out how to create the date range using a series of ifelse
statements and filter
with a left_join
for the first date range (1-30 days):
x <- 1
y <- 30
df <- have %>% group_by(id) %>%
mutate(flag = ifelse(time.to.first >= x & time.to.first <= y,max(visit),""),
flag2 = ifelse(flag == max(flag) & flag != "",1,"")) %>%
filter(flag > 0 & flag2 == 1) %>%
filter(visit == max(visit)) %>%
mutate(time = paste(x,"-", y, sep = "")) %>%
dplyr::select(time, id, visit) %>%
left_join(have, ., by = c("id","visit"))
I was thinking I could use a double nested for loop for the x
and y
variables in order to create a program that would do the rest of the date ranges, but I understand that nested loops might not be the most efficient way to go about this.
I was trying to think of a way to make the program a little more robust so I could change the timing of the window (form 30 days to 90,180,360 etc...) but I am not sure how to approach this.
I do not want the code written for me but would love with ideas on function or examples that you think might be helpful. I have been having a difficult time finding more information this type of program so any additional links would be helpful!