0

I'm back with my survey data.

This time, I need to remove a specific set of rows from data when they occur. In our survey, an automated telephone survey, the survey tool will attempt three times during that call to prompt the respondent to enter a response. After three timeouts of the question the survey tool hangs up. This mostly happens when the call goes to someone's voicemail.

I would like to identify that pattern when it happens so I can remove it from calculating call time.

The pattern I am looking for looks like this in the Interactions column:

Example pattern

It doesn't HAVE to be Intro. It can be any part of the survey where it prompting the respondent for a response THREE times but no response is provided so the call fails. But, it does have to be sandwiched in between "Answer" (the phone picks up) and "Timeout. Call failed." (a failure).

I did try to apply what I learned from yesterday's solution (about run length encoding) to my other indexing question but I couldn't make it work in the slightest. So, here I am.

Here's an example dataset:

This is 15 respondents and every interaction between the survey tool and the respondent (or their phone, essentially).

Here's the code for the dataframe: This goes to a Google Drive text editor with the code

JeniFav
  • 113
  • 1
  • 9
  • 1
    You'll get help a lot faster if you create a **minimal** reproducible example. We don't need 1000 rows to understand the problem or test and demonstrate a solution---in fact all of those steps are easier if you can gives us a ~20-30 row sample of data that fits in the question. And, with that small of an example, you can manually show us exactly what the expected output is so there is less ambiguity. – Gregor Thomas Dec 19 '19 at 23:30
  • I actually thought that was minimal - I thought it might be important to show the variation in the records. – JeniFav Dec 23 '19 at 15:28

1 Answers1

1

If I understand the question correctly, the function below removes all rows between a row with "Answer" and a failure value (there are 3 such values in the question).
The name of the column to look for defaults to "Interactions", and the first answer and failure values also have defaults assigned.
Note that all match instructions are case sensitive.

removeRows <- function(X, col = "Interaction", 
                       ans = "Answer", 
                       fail = c("Timeout. Call failed.", "Partial", "Enqueueing call"))
{  
  a <- grep(ans, X[[col]])
  f <- which(X[[col]] %in% fail)
  a <- a[findInterval(f, a)]

  for(i in seq_along(a)){
    X[[col]][a[i]:f[i]] <- NA_character_
  }
  Y <- X[complete.cases(X), , drop = FALSE]
  Y
}

removeRows(survey_data)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thanks! I finally got around to trying this (holidays...). This solution is too broad, but that might be because I didn't do a good job describing the problem (as illustrated by it being closed). I need to specifically to only remove the rows where 3 attempts are made to prompt a response but no response is provided. So, where the prompt is Intro and there's no response so it times out and eventually the call fails. – JeniFav Jan 02 '20 at 18:59