1) Based on the sample data we assume that the data is in the form of hh:mm:00 where hh < 24.
Read in the test data. Create two functions which convert a character string of the form hh:mm:00 to number of minutes and a function which converts number of minutes to a chron "times"
object. Create minute by minute sequences for each row of the data giving the Intervals
list. Union those sequences which correspond to the same switch giving the list Intervals.u
and then intersect the components of that list to give the sequence Intersection
. Compute the runs, r
, in Intersection
to give a set of start and end points. Finally calcualte the number of minutes and converting that to "times"
class the duration. (The number of minutes and duration only depend on r
and Intersection
so we could skip the lines ending in ## if intervals.df
were not needed.)
# test data
Lines <- "Switches,State,Intime,Outtime
sw3,1,9:00:00,10:40:00
sw2,1,9:30:00,10:15:00
sw1,1,10:00:00,11:00:00
sw2,1,10:20:00,10:30:00"
DF <- read.csv(text = Lines, as.is = TRUE)
library(chron)
to.num <- function(x) floor(as.numeric(times(x)) * 24 * 60 + 1e-6)
to.times <- function(x) times(x / (24 * 60))
Seq <- function(r) seq(to.num(DF$Intime[r]), to.num(DF$Outtime[r]))
Intervals <- lapply(1:nrow(DF), Seq)
Intervals.u <- lapply(split(Intervals, DF$Switches),
function(L) Reduce(union, L))
Intersection <- Reduce(intersect, Intervals.u)
r <- rle(c(FALSE, diff(Intersection) == 1))
i.ends <- cumsum(r$lengths)[r$values] ##
ends <- to.times(Intersection[i.ends]) ##
starts <- ends - to.times(r$lengths[r$values]) ##
intervals.df <- data.frame(start = starts, end = ends); intervals.df ##
## start end
## 1 10:00:00 10:15:00
## 2 10:20:00 10:30:00
mins <- length(Intersection) - sum(r$values); mins
## [1] 25
duration <- to.times(mins); duration
## [1] 00:25:00
2) Regarding comments pertaining to speed we could, instead, use the IRanges package which encodes ranges efficiently and also reduces the code size slightly:
library(IRanges)
Intervals <- IRanges(to.num(DF$Intime), to.num(DF$Outtime))
Intersection <- Reduce(intersect, split(Intervals, DF$Switches))
intervals.df <- data.frame(start = to.times(start(Intersection)),
end = to.times(end(Intersection)))
intervals.df
## start end
## 1 10:00:00 10:15:00
## 2 10:20:00 10:30:00
mins <- sum(width(Intersection) - 1); mins
## [1] 25
duration <- to.times(mins); duration
## [1] 00:25:00
Updates Some fixes and better variable names. Further improvements. Added (2).