I posted this question on 12/19. I received one response that was very helpful but not quite what I was looking for. Then the question was closed by three folks with the specification it needed more focus. the instructions indicated I could update the question or post a new on but after editing it to make it more focused it remained closed. So, I am posting it again.
Here is the link to the edited question, including a more concise dataset (which had been one critical comment): Identifying a specific pattern in several adjacent rows of a single column - R
But, in case that link isn't allowed, here's the content:
I need to remove a specific set of rows from data when they occur. In our survey, an automated telephone survey, the survey tool will attempt three times during that call to prompt the respondent to enter a response. After three timeouts of the question the survey tool hangs up. This mostly happens when the call goes to someone's voicemail.
I would like to identify that pattern when it happens so I can remove it from calculating call time.
The pattern I am looking for looks like this in the Interactions column:
It doesn't HAVE to be Intro. It can be any part of the survey where it prompting the respondent for a response THREE times but no response is provided so the call fails. But, it does have to be sandwiched in between "Answer" (the phone picks up) and "Timeout. Call failed." (a failure).
I did try to apply what I learned from yesterday's solution (about run length encoding) to my other indexing question but I couldn't make it work in the slightest. So, here I am.
Here's an example dataset:
This is 4 respondents and every interaction between the survey tool and the respondent (or their phone, essentially).
Here's the code for the dataframe: This goes to a Google Drive text editor with the code
The response I got from Rui Barradas was this:
removeRows <- function(X, col = "Interaction",
ans = "Answer",
fail = c("Timeout. Call failed.", "Partial", "Enqueueing call"))
{
a <- grep(ans, X[[col]])
f <- which(X[[col]] %in% fail)
a <- a[findInterval(f, a)]
for(i in seq_along(a)){
X[[col]][a[i]:f[i]] <- NA_character_
}
Y <- X[complete.cases(X), , drop = FALSE]
Y
}
removeRows(survey_data)
However, this solution is too broad. I need to specifically to only remove the rows where 3 attempts are made to prompt a response but no response is provided. So, where the prompt is Intro and there's no response so it times out and eventually the call fails.
Thanks!