0

I could use your help with the following:

I have a data frame with "visit" and "days in study" per participant, so showing the number of days that the participant was in the study on a given visit:

visits <- data.frame(Subj=rep(paste0("S",c(1:3)), each=3), 
          Visit=rep(paste0("V", c(1:3)), times=3), DaysInStudy =
          c(0,8,17,-1,7,15,-1,9,17))

I have another data frame with scores per day per participant:

set.seed(123)
scores <- data.frame(Subj=rep(paste0("S",c(1:3)), each=27), 
          DaysInStudy=c(-9:17), scores=sample(9))

I want to add the number of the visit (V1-V3) to the scores data frame to all rows that preceed that visit by 6 days (based on the DaysInStudy columns). In my data, there are no overlapping intervals and days outside the intervals should be set to NA.

This is what the results should look like:

    Subj DaysInStudy scores visit
1    S1          -9      3    NA
2    S1          -8      7    NA
3    S1          -7      9    NA
4    S1          -6      6    V1
5    S1          -5      5    V1
....
10   S1           0      3    V1
11   S1           1      7    NA 
12   S1           2      9    V2

I hope my question is clear. I'm sure there is a straightforward solution but I can't think of any at the moment. Thank you for your help!

Edit: I came up with something like this:

scores$visit <- visits$Visit[match(paste(scores$Subj, scores$DaysInStudy), paste(visits$Subj, visits$DaysInStudy))]

scores %>% 
  mutate(visit1 = lead(visit), visit2 = lead(visit,2), visit3 = lead(visit,3), visit4 = lead(visit,4),
   visit5 = lead(visit,5), visit6 = lead(visit,6),
   visit = coalesce(visit, visit1, visit2, visit3, visit4, visit5, visit6)) %>%
   select(-c(visit1, visit2, visit3, visit4, visit5, visit6))

However, this is not a perfect solution, because in my actual dataset, I have more measurements on one day, so I would like to add the visits to all rows in which DaysInStudy are equal to the DaysInStudy of the visit and the 6 days prior to it.

user3640617
  • 1,546
  • 13
  • 21
  • Something like `merge(scores, visits, by = c("Subj", "DaysInStudy"), all.x = TRUE)` should work, though with your seed of 123, I see subject `S1` visiting twice on `-1`, resulting in duplicate rows. (You can easily handle this with some aggregation/grouping.) – r2evans Sep 29 '17 at 21:15
  • I'm sorry, there was a mistake in the data.frame which I corrected. I also updated my answer to better explain what I mean. – user3640617 Sep 29 '17 at 21:39
  • So my `merge` command still stands ... can you explain what it is not doing correctly? – r2evans Sep 29 '17 at 21:51
  • You mean DaysInStudy in scores data frame should precede 6 days than DaysInStudy in visits data frame? – Santosh M. Sep 29 '17 at 22:03

2 Answers2

1

You will need to join on specific conditions, for example using sqldf:

sqldf:::sqldf({"
  SELECT s.*, v.Visit 
  FROM scores s
  LEFT JOIN visits v on
    s.Subj = v.subj and
    s.DaysInStudy >= v.DaysInStudy - 6 and
    s.DaysInStudy <= v.DaysInStudy"})


# Subj DaysInStudy scores Visit
# 1    S1          -9      3  <NA>
# 2    S1          -8      7  <NA>
# 3    S1          -7      9  <NA>
# 4    S1          -6      6    V1
# 5    S1          -5      5    V1
# 6    S1          -4      1    V1
# 7    S1          -3      2    V1
# 8    S1          -2      8    V1
# 9    S1          -1      4    V1
# 10   S1           0      3    V1
# 11   S1           1      7  <NA>
GoGonzo
  • 2,637
  • 1
  • 18
  • 25
0

Let's use magical dplyr

library(dplyr)
df = visits %>%
  left_join(scores, by = c("Subj")) %>%
  mutate(Visit = replace(Visit, !(DaysInStudy.y >= DaysInStudy.x - 6
                             & DaysInStudy.y <= DaysInStudy.x), NA)) %>%
  select(-DaysInStudy.x, -visit, -Visit, Subj, DaysInStudy = DaysInStudy.y, scores, Visit)


#Showing top 10 rows of output df: head(df, n=10)

    Subj DaysInStudy scores Visit
1    S1          -9      3   <NA>
2    S1          -8      7   <NA>
3    S1          -7      9   <NA>
4    S1          -6      6    V1
5    S1          -5      5    V1
6    S1          -4      1    V1
7    S1          -3      2    V1
8    S1          -2      8    V1
9    S1          -1      4    V1
10   S1           0      3    V1
Santosh M.
  • 2,356
  • 1
  • 17
  • 29
  • Thank you for your help, but it doesn't work perfectly, because in the output, all days are repeated for each visit. – user3640617 Oct 01 '17 at 12:41