1

Hi friends i have a problem finding if multiple datetime ranges overlap each other and if yes the time period for which they overlap.I have refereed following links Determine Whether Two Date Ranges Overlap and Algorithm to detect overlapping periods and some more.

Don't know if this is right,i have sample explanation for n=3.

Say I have 'n' switches sw1,sw2 & sw3.State is ON/OFF state ie 1/0.

Switches,State,Intime,Outtime

sw3,1,9:00:00,10:40:00
sw2,1,9:30:00,10:15:00
sw1,1,10:00:00,11:00:00
sw2,1,10:20:00,10:30:00

I have come across this one possibility.There might be more.Still looking for others.Here the common time period is from 10:00 to 10:15 ie 15 mins and 10:20 to 10:30 ie 10 mins.The combined time period for which these switches were ON('1') is 25 mins.

                 10:00                           11:00
              sw1 |-----------------------------------|
       9:30       10:15   10:20     10:30
     sw2 |-------------|      |-------|
 9:00                                     10:40 
sw3 |----------------------------------------| 

Generalizing this datetime for n overlapping switches is a difficult task.I m still working on it so any suggestions or modifications are welcomed.Thank you.

Community
  • 1
  • 1
OnkarK
  • 41
  • 9

2 Answers2

2

1) Based on the sample data we assume that the data is in the form of hh:mm:00 where hh < 24.

Read in the test data. Create two functions which convert a character string of the form hh:mm:00 to number of minutes and a function which converts number of minutes to a chron "times" object. Create minute by minute sequences for each row of the data giving the Intervals list. Union those sequences which correspond to the same switch giving the list Intervals.u and then intersect the components of that list to give the sequence Intersection. Compute the runs, r, in Intersection to give a set of start and end points. Finally calcualte the number of minutes and converting that to "times" class the duration. (The number of minutes and duration only depend on r and Intersection so we could skip the lines ending in ## if intervals.df were not needed.)

# test data
Lines <- "Switches,State,Intime,Outtime
sw3,1,9:00:00,10:40:00
sw2,1,9:30:00,10:15:00
sw1,1,10:00:00,11:00:00
sw2,1,10:20:00,10:30:00"
DF <- read.csv(text = Lines, as.is = TRUE)

library(chron)

to.num <- function(x) floor(as.numeric(times(x)) * 24 * 60 + 1e-6)
to.times <- function(x) times(x / (24 * 60))

Seq <- function(r) seq(to.num(DF$Intime[r]), to.num(DF$Outtime[r]))    
Intervals <- lapply(1:nrow(DF), Seq)
Intervals.u <- lapply(split(Intervals, DF$Switches), 
     function(L) Reduce(union, L))
Intersection <- Reduce(intersect, Intervals.u)

r <- rle(c(FALSE, diff(Intersection) == 1))

i.ends <- cumsum(r$lengths)[r$values] ##
ends <- to.times(Intersection[i.ends]) ##
starts <- ends - to.times(r$lengths[r$values]) ##
intervals.df <- data.frame(start = starts, end = ends); intervals.df ##
##         start      end
##    1 10:00:00 10:15:00
##    2 10:20:00 10:30:00

mins <- length(Intersection) - sum(r$values); mins
## [1] 25
duration <- to.times(mins); duration
## [1] 00:25:00

2) Regarding comments pertaining to speed we could, instead, use the IRanges package which encodes ranges efficiently and also reduces the code size slightly:

library(IRanges)
Intervals <- IRanges(to.num(DF$Intime), to.num(DF$Outtime))
Intersection <- Reduce(intersect, split(Intervals, DF$Switches))

intervals.df <- data.frame(start = to.times(start(Intersection)), 
                           end = to.times(end(Intersection)))
intervals.df
##      start      end
## 1 10:00:00 10:15:00
## 2 10:20:00 10:30:00

mins <- sum(width(Intersection) - 1); mins
## [1] 25
duration <- to.times(mins); duration
## [1] 00:25:00

Updates Some fixes and better variable names. Further improvements. Added (2).

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • This is a superb idea for this problem I have almost tried it for all combinations.I had a question as to how it will work for a larger timespan.Say if we add dates along with time and we had to calculate if switches were "ON"(1) for a day.In that case will minute wise execution take more time or larger space..? – OnkarK Jul 15 '14 at 11:50
  • Regarding speed we could efficiently encode the ranges using the IRanges package. See variation (2) above. – G. Grothendieck Jul 15 '14 at 19:23
0

One way of doing it would be:

  1. Calculate the unique minutes/seconds between the Intime and Outtime for each switch. E.g. if a switch turns on at 9:00 and goes off at 9:02, the unique minutes it was on for span 9:00 and 9:01.
  2. Count how many times each unique minute/second appears across all switches.
  3. Where any minute/second occurs as many times as there are switches (i.e. three in your case), then all switches must be on for that minute/second.

Using that logic here's a potential solution (where your data is stored in a data frame x):

# Function to convert string to time.
asTime <- function (tm) as.POSIXlt(tm, format = '%H:%M:%S')

# Calculate unique minutes between Intimes and Outtimes.
minSpan <- function (start, end) seq(asTime(start), asTime(end) - 1, 'min')

# Calculate the time span in minutes for each row.
spans <- mapply(minSpan, x$Intime, x$Outtime)

# Count how many times each minute appears.
counts <- table(do.call(c, spans))

# Total number of switches.
switches <- length(unique(x$Switches))

# Count minutes where all switches have been on.
length(counts[counts == switches])

This will give you precision to one minute because that seems to be what you were displaying in your question. Although you can easily change it to seconds be changing 'min' to 'sec' in the minSpan() function.


In minSpan() I subtract one minute from the Outtime:

minSpan <- function (start, end) seq(asTime(start), asTime(end) - 1, 'min')

That's because if you were to count the minutes between e.g. 10:00 and 10:02, seq() will return three minutes, 10:00, 10:01, 10:02. But in reality the switch turned off at 10:02, so you really want the span from 10:00 to 10:01.


Anyway, this solution seems to work for the small example you've given. Depending on how large your data is I'd expect this to be slow enough but that might not be an issue.

Ciarán Tobin
  • 7,306
  • 1
  • 29
  • 45
  • Thanks a lot.This Solution works very well too.I have tried to test it with large data it is taking a bit more time.Ill try with some other combinations n let u know. – OnkarK Jul 15 '14 at 12:28