I could use your help with the following:
I have a data frame with "visit" and "days in study" per participant, so showing the number of days that the participant was in the study on a given visit:
visits <- data.frame(Subj=rep(paste0("S",c(1:3)), each=3),
Visit=rep(paste0("V", c(1:3)), times=3), DaysInStudy =
c(0,8,17,-1,7,15,-1,9,17))
I have another data frame with scores per day per participant:
set.seed(123)
scores <- data.frame(Subj=rep(paste0("S",c(1:3)), each=27),
DaysInStudy=c(-9:17), scores=sample(9))
I want to add the number of the visit (V1-V3) to the scores data frame to all rows that preceed that visit by 6 days (based on the DaysInStudy columns). In my data, there are no overlapping intervals and days outside the intervals should be set to NA.
This is what the results should look like:
Subj DaysInStudy scores visit
1 S1 -9 3 NA
2 S1 -8 7 NA
3 S1 -7 9 NA
4 S1 -6 6 V1
5 S1 -5 5 V1
....
10 S1 0 3 V1
11 S1 1 7 NA
12 S1 2 9 V2
I hope my question is clear. I'm sure there is a straightforward solution but I can't think of any at the moment. Thank you for your help!
Edit: I came up with something like this:
scores$visit <- visits$Visit[match(paste(scores$Subj, scores$DaysInStudy), paste(visits$Subj, visits$DaysInStudy))]
scores %>%
mutate(visit1 = lead(visit), visit2 = lead(visit,2), visit3 = lead(visit,3), visit4 = lead(visit,4),
visit5 = lead(visit,5), visit6 = lead(visit,6),
visit = coalesce(visit, visit1, visit2, visit3, visit4, visit5, visit6)) %>%
select(-c(visit1, visit2, visit3, visit4, visit5, visit6))
However, this is not a perfect solution, because in my actual dataset, I have more measurements on one day, so I would like to add the visits to all rows in which DaysInStudy are equal to the DaysInStudy of the visit and the 6 days prior to it.