I've got a dataset coming from a support ticketing system that logs each click made by an agent in classifying and responding to customer requests. The system assigns a new hist_id to each click, but an agent will click several fields, triggering several rows in the table, in what they consider a single "interaction".
My goal is to calculate a handle time for each of these interaction by doing a diff on the first and last modify_time values in each group.
I'm stuck currently because an agent will have multiple interactions with a case throughout the day.
Here's a sample dataframe:
hist_id <- c(1234, 2345, 3456, 4567, 5678, 6789, 7890)
case_id <- c(1, 1, 1, 1, 1, 1, 1)
agent_name <- c("John", "John", "John", "Paul", "Paul", "John", "John")
modify_time <- as.POSIXct(c(1510095120, 1510095180, 1510095240, 1510098600, 1510098720, 1510135200, 1510135320), origin = "1970-01-01")
df <- data.frame(hist_id, case_id, agent_name, modify_time)
Using group by on the case_id and agent_name groups all rows that match the criteria, as expected:
df %>% group_by(case_id, agent_name) %>% mutate(first = first(modify_time), last = last(modify_time), diff = min(difftime(last, first)))
Which gives me this:
# A tibble: 7 x 7
# Groups: case_id, agent_name [2]
hist_id case_id agent_name modify_time first last diff
<dbl> <dbl> <fctr> <dttm> <dttm> <dttm> <time>
1 1234 1 John 2017-11-07 16:52:00 2017-11-07 16:52:00 2017-11-08 04:02:00 40200 secs
2 2345 1 John 2017-11-07 16:53:00 2017-11-07 16:52:00 2017-11-08 04:02:00 40200 secs
3 3456 1 John 2017-11-07 16:54:00 2017-11-07 16:52:00 2017-11-08 04:02:00 40200 secs
4 4567 1 Paul 2017-11-07 17:50:00 2017-11-07 17:50:00 2017-11-07 17:52:00 120 secs
5 5678 1 Paul 2017-11-07 17:52:00 2017-11-07 17:50:00 2017-11-07 17:52:00 120 secs
6 6789 1 John 2017-11-08 04:00:00 2017-11-07 16:52:00 2017-11-08 04:02:00 40200 secs
7 7890 1 John 2017-11-08 04:02:00 2017-11-07 16:52:00 2017-11-08 04:02:00 40200 secs
Where John's true first and last modify_times are returned. However, I need to group the consecutive matches of case_id and agent_name, so that Paul's interaction is considered. So three interactions are recorded here: one from John, one from Paul, and a second by John.
Desired output would be something like this:
# A tibble: 7 x 7
# Groups: case_id, agent_name [2]
hist_id case_id agent_name modify_time first last diff
<dbl> <dbl> <fctr> <dttm> <dttm> <dttm> <time>
1 1234 1 John 2017-11-07 16:52:00 2017-11-07 16:52:00 2017-11-07 16:54:00 120 secs
2 2345 1 John 2017-11-07 16:53:00 2017-11-07 16:52:00 2017-11-07 16:54:00 120 secs
3 3456 1 John 2017-11-07 16:54:00 2017-11-07 16:52:00 2017-11-07 16:54:00 120 secs
4 4567 1 Paul 2017-11-07 17:50:00 2017-11-07 17:50:00 2017-11-07 17:52:00 120 secs
5 5678 1 Paul 2017-11-07 17:52:00 2017-11-07 17:50:00 2017-11-07 17:52:00 120 secs
6 6789 1 John 2017-11-08 04:00:00 2017-11-08 04:00:00 2017-11-08 04:02:00 120 secs
7 7890 1 John 2017-11-08 04:02:00 2017-11-08 04:00:00 2017-11-08 04:02:00 120 secs