Objective
I have a dataset that tracks when someone was editing a message, left, and came back to resume editing that same message. I am trying to group this one message together and find its total duration.
Problem
I need to first group the message together if the following conditions apply:
if the Folder == 'out'
OR drafts
, the Message == ""
, Edit == "T"
, and if the contents in the
Subject and Re columns are consecutively the same.
Ideally, this would give me group A along with its duration. For example, this first 'block' of data would be labeled 'Group A' , would have a start time of 1/2/2020 1:00:01 AM and an end time of 1/2/2020 1:00:30 AM
Furthermore, I would like to 'match' group A with another 'block' of data if the last row of the Subject, Re and Length column matches its first row. So the second Group A would have a start time of 1/2/2020 1:02:00 AM and an end time of 1/2/2020 1:02:05 AM.
Subject Re Length Folder Message Date Edit
a@mail.com,b@mail.com 80 out 1/2/2020 1:00:01 AM T
a@mail.com,b@mail.com 80 out 1/2/2020 1:00:05 AM T
hey a@mail.com,b@mail.com 80 out 1/2/2020 1:00:10 AM T
hey a@mail.com,b@mail.com 80 out 1/2/2020 1:00:15 AM T
hey a@mail.com,b@mail.com 80 out 1/2/2020 1:00:30 AM T
hey a@mail.com,b@mail.com 80 draft 1/2/2020 1:02:00 AM T
hey a@mail.com,b@mail.com 80 draft 1/2/2020 1:02:05 AM T
hey a@mail.com,b@mail.com 80 out 1/2/2020 1:03:10 AM T
hey a@mail.com,b@mail.com 80 out 1/2/2020 1:03:20 AM T
Desired Output
Start End Duration Group
1/2/2020 1:00:10 AM 1/2/2020 1:00:30 AM 20 A
1/2/2020 1:02:00 AM 1/2/2020 1:02:05 AM 5 A
1/2/2020 1:03:10 AM 1/2/2020 1:03:20 AM 10 A
dput:
structure(list(Subject = structure(c(1L, 1L, 2L, 2L, 2L, 1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("", "hey"), class = "factor"),
Recipient = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 2L), .Label = c("", "sarah@mail.com,gee@mail.com"
), class = "factor"), Length = c(80L, 80L, 80L, 80L, 80L,
NA, NA, 80L, 80L, NA, NA, 100L, 100L), Folder = structure(c(3L,
3L, 3L, 3L, 3L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("",
"draft", "out"), class = "factor"), Message = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Date = structure(c(2L,
3L, 4L, 5L, 6L, 1L, 1L, 7L, 8L, 1L, 1L, 9L, 10L), .Label = c("",
"1/2/2020 1:00:01 AM", "1/2/2020 1:00:05 AM", "1/2/2020 1:00:10 AM",
"1/2/2020 1:00:15 AM", "1/2/2020 1:00:30 AM", "1/2/2020 1:02:00 AM",
"1/2/2020 1:02:05 AM", "1/2/2020 1:03:00 AM", "1/2/2020 1:03:20 AM"
), class = "factor"), Edit = c(TRUE, TRUE, TRUE, TRUE, TRUE,
NA, NA, TRUE, TRUE, NA, NA, TRUE, TRUE)), class = "data.frame", row.names = c(NA,
-13L))
I am thinking I may have to do an if/else statement or some sort of filter. Any help/suggestion or advice would be greatly appreciated.
library(tidyverse)
library(lubridate)
df1<-df2 %>%
mutate(Date = lubridate::mdy_hms(Date),
cond = Edit == "T" & ItemFolderName == "out" | Folder == "drafts" & Message == "" & Subject == ? & Re == ? & Length == ? ,
grp = cumsum(!cond)) %>%
filter(cond) %>%
group_by(grp) %>%