I'm trying to select groups in a grouped df that contain a specific string on a specific row within each group.
Consider the following df:
df <- data.frame(id = c(rep("id_1", 4),
rep("id_2", 4),
rep("id_3", 4)),
string = c("here",
"is",
"some",
"text",
"here",
"is",
"other",
"text",
"there",
"are",
"final",
"texts"))
I want to create a dataframe that contains just the groups that have the word "is" on the second row.
Here is some incorrect code:
desired_df <- df %>% group_by(id) %>%
filter(slice(select(., string), 2) %in% "is")
Here is the desired output:
desired_df <- data.frame(id = c(rep("id_1", 4),
rep("id_2", 4)),
string = c("here",
"is",
"some",
"text",
"here",
"is",
"other",
"text"))
I've looked here but this doesn't solve my issue because this finds groups with any occurrence of the specified string.
I could also do some sort of separate code where I identify the ids and then use that to subset the original df, like so:
ids <- df %>% group_by(id) %>% slice(2) %>% filter(string %in% "is") %>% select(id)
desired_df <- df %>% filter(id %in% ids$id)
But I'm wondering if I can do something simpler within a single pipe series.
Help appreciated!