1

Is there a convenient and elegant existing approach to find contiguous regions in logical time series containing values True or 1? I am looking for something returning ts summary of the form:

Region_id               Start                Stop
        1 YYYY-MM-DD HH:MM:SS YYYY-MM-DD HH:MM:SS
        2 YYYY-MM-DD HH:MM:SS YYYY-MM-DD HH:MM:SS
        ... etc

Example input ts:

mins <- function (N, from = as.character(Sys.time()), cols = 1, by = 1) 
{
deltas <- seq(from = 0, by = 60 * by, length.out = N)
nacol <- matrix(data = NA, ncol = cols, nrow = N)
xts(x = nacol, order.by = strptime(from, format = "%Y-%m-%d %H:%M") + 
    deltas)
}

d <- mins(N=20,cols=1)
d[,1] <- F; d[5:12,1] <- T; d[14:20,1] <- T
d
                     [,1]
2012-12-18 20:48:00 FALSE
2012-12-18 20:49:00 FALSE
2012-12-18 20:50:00 FALSE
2012-12-18 20:51:00 FALSE
2012-12-18 20:52:00  TRUE
2012-12-18 20:53:00  TRUE
2012-12-18 20:54:00  TRUE
2012-12-18 20:55:00  TRUE
2012-12-18 20:56:00  TRUE
2012-12-18 20:57:00  TRUE
2012-12-18 20:58:00  TRUE
2012-12-18 20:59:00  TRUE
2012-12-18 21:00:00 FALSE
2012-12-18 21:01:00  TRUE
2012-12-18 21:02:00  TRUE
2012-12-18 21:03:00  TRUE
2012-12-18 21:04:00  TRUE
2012-12-18 21:05:00  TRUE
2012-12-18 21:06:00  TRUE
2012-12-18 21:07:00  TRUE

# so far for the _idealized_ input, now the function I am looking for to return data.frame 
# like this for the d object as above:
Region_id               Start                Stop
        1 2012-12-18 20:52:00 2012-12-18 20:59:00
        2 2012-12-18 21:01:00 2012-12-18 21:07:00

That is probably common task for binary signal processing so it is worth of searching. Of course, it is idealized. Just for start. The reality will be more complex.

Petr Matousu
  • 3,120
  • 1
  • 20
  • 32
  • 2
    Your question isn't clear. Can you provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? ...and don't let @joran edit it out of your question this time. ;-) – Joshua Ulrich Dec 18 '12 at 19:31
  • +1 for joran related comment – Petr Matousu Dec 18 '12 at 19:36
  • + another! take a look at `lubridate` and `as.period`... but thats all I got until I understand what a logical time series is. – Justin Dec 18 '12 at 19:50

1 Answers1

2

First, use rle to find the contiguous blocks, then create an indicator to separate each block.

r <- rle(coredata(d)[,1])
ind <- rep(seq_along(r$lengths), r$lengths)

Now you can use the indicator to split your xts object, and run a min/max function on each contiguous block.

s <- split(index(d), ind)
l <- lapply(s, function(x) data.frame(start=min(x), stop=max(x)))

Then you can rbind the above result into one data.frame, create the region column, subset only the TRUE values, and take the cumulative sum. Note that my times are different due to timezone differences, but the concept is correct.

out <- do.call(rbind, l)
out$region <- r$values
out <- out[out$region,]
out$region <- cumsum(out$region)
out
#                 start                stop region
# 2 2012-12-18 20:45:00 2012-12-18 20:52:00      1
# 4 2012-12-18 20:54:00 2012-12-18 21:00:00      2
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418