I have a grouped data frame (meaning that I have a list of patients and each patient has several treatments). I now want to remove all rows, that do not meet a specific criteria:
The last column contains yes and no. I only want the rows that contains the last no and the first yes.
PATIENT.ID Caffeinefactor
---------------------------
21 no
21 no
21 no
21 yes
21 yes
34 no
34 no
34 yes
34 yes
16 no
16 no
DF = structure(list(PATIENT.ID = c(210625L, 210625L, 210625L, 210625L,
210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 210625L,
210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 220909L,
220909L, 220909L, 220909L, 220909L, 220909L, 220909L, 220909L,
220909L, 220909L, 221179L, 221179L, 221179L, 221179L, 221179L,
221179L, 221179L, 221179L, 221179L, 221179L, 221179L, 221179L,
221179L, 221179L, 301705L, 301705L, 301705L, 301705L, 301705L,
301705L, 301705L, 301705L, 301705L, 301705L, 301705L, 301705L,
301705L, 301705L, 301705L, 303926L, 303926L, 303926L, 303926L
), PATIENT.TREATMENT.NUMBER = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 17L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 12L, 13L, 14L, 15L, 16L, 1L, 2L, 3L, 4L), Caffeinefactor = c("no",
"no", "no", "no", "yes", "yes", "yes", "no", "yes", "yes", "yes",
"yes", "yes", "no", "no", "yes", "yes", "yes", "yes", "yes",
"yes", "yes", "yes", "yes", "yes", "yes", "no", "no", "no", "no",
"no", "no", "no", "no", "no", "no", "yes", "yes", "yes", "yes",
"yes", "no", "no", "no", "no", "no", "no", "yes", "no", "yes",
"yes", "yes", "yes", "yes", "yes", "yes", "no", "no", "no", "no"
)), row.names = c(NA, -60L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x7fe7f7002ee0>)
Expected Output:
PATIENT.ID Caffeinefactor
---------------------------
21 no
21 yes
34 no
34 yes
I started off with this code:
daf1 <- df %>%
group_by(PATIENT.ID) %>%
mutate(firstc = (Caffeinefactor == 'yes' & lag(Caffeinefactor) == 'no')) %>%
# Find the first row with `double_B` in each group, filter out rows after it
filter(row_number() <= min(which(firstc == TRUE)))
however, I cannot seem to solve the problem, to delete everything before AND after the two needed rows.