I have a dataset, df:
Ultimately I would like to be able to group the data into 'chunks' where the Folder column contains the string 'Out', making sure to consider the DATE that it is associated with. Is there a way to create a chunk for each instance that 'Out' occurs, while computing its duration.
Folder DATE
Out 9/9/2019 5:46:00
Out 9/9/2019 5:46:01
Out 9/9/2019 5:46:02
In 9/9/2019 5:46:03
In 9/9/2019 5:46:04
Out 9/10/2019 6:00:01
Out 9/10/2019 6:00:02
In 9/11/2019 7:50:00
In 9/11/2019 7:50:01
I would like this output:
New Variable Duration
Out1 2 sec
Out2 1 sec
I have included the dput:
structure(list(Folder = structure(c(2L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 1L), .Label = c("In", "Outdata"), class = "factor"), Date = structure(c(3L,
3L, 3L, 3L, 3L, 1L, 1L, 2L, 2L), .Label = c("9/10/2019 6:00",
"9/11/2019 7:50", "9/9/2019 5:46"), class = "factor")), class = "data.frame", row.names = c(NA,
-9L))
What I have tried so far:
#Loading appropriate libraries
library(dplyr)
library(lubridate)
creating new variable that will first group the folder by the string 'Out'
(However, this is where I am not sure what to do because I wish to create a new variable for each 'Out' group and its time duration to ultimately plot this on a histogram.)
newdf<-df %>%
group_by(df$Folder) %>%
summarise(mutate(Duration = difftime(as.POSIXct(ss_EndTime, format =
"%m/%d/%Y %I:%M:%S %p"),as.POSIXct(ss_StartTime,
format = "%m/%d/%Y %I:%M:%S %p" ), units = "secs")))
I will continue researching, all suggestions are appreciated.