18

I have an hourly value. I want to count how many consecutive hours the value has been zero since the last time it was not zero. This is an easy job for a spreadsheet or for loop, but I am hoping for a snappy vectorized one-liner to accomplish the task.

x <- c(1, 0, 1, 0, 0, 0, 1, 1, 0, 0)
df <- data.frame(x, zcount = NA)

df$zcount[1] <- ifelse(df$x[1] == 0, 1, 0)
for(i in 2:nrow(df)) 
  df$zcount[i] <- ifelse(df$x[i] == 0, df$zcount[i - 1] + 1, 0)

Desired output:

R> df
   x zcount
1  1      0
2  0      1
3  1      0
4  0      1
5  0      2
6  0      3
7  1      0
8  1      0
9  0      1
10 0      2
Henrik
  • 65,555
  • 14
  • 143
  • 159
J. Win.
  • 6,662
  • 7
  • 34
  • 52

6 Answers6

25

William Dunlap's posts on R-help are the place to look for all things related to run lengths. His f7 from this post is

f7 <- function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)}

and in the current situation f7(!x). In terms of performance there is

> x <- sample(0:1, 1000000, TRUE)
> system.time(res7 <- f7(!x))
   user  system elapsed 
  0.076   0.000   0.077 
> system.time(res0 <- cumul_zeros(x))
   user  system elapsed 
  0.345   0.003   0.349 
> identical(res7, res0)
[1] TRUE
Martin Morgan
  • 45,935
  • 7
  • 84
  • 112
23

Here's a way, building on Joshua's rle approach: (EDITED to use seq_len and lapply as per Marek's suggestion)

> (!x) * unlist(lapply(rle(x)$lengths, seq_len))
 [1] 0 1 0 1 2 3 0 0 1 2

UPDATE. Just for kicks, here's another way to do it, around 5 times faster:

cumul_zeros <- function(x)  {
  x <- !x
  rl <- rle(x)
  len <- rl$lengths
  v <- rl$values
  cumLen <- cumsum(len)
  z <- x
  # replace the 0 at the end of each zero-block in z by the 
  # negative of the length of the preceding 1-block....
  iDrops <- c(0, diff(v)) < 0
  z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
  # ... to ensure that the cumsum below does the right thing.
  # We zap the cumsum with x so only the cumsums for the 1-blocks survive:
  x*cumsum(z)
}

Try an example:

> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))
 [1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0

Now compare times on a million-length vector:

> x <- sample(0:1, 1000000,T)
> system.time( z <- cumul_zeros(x))
   user  system elapsed 
   0.15    0.00    0.14 
> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))
   user  system elapsed 
   0.75    0.00    0.75 

Moral of the story: one-liners are nicer and easier to understand, but not always the fastest!

Prasad Chalasani
  • 19,912
  • 7
  • 51
  • 73
  • 2
    +1 brilliant one-liner. Little code profiling: `(!x) * unlist(lapply(rle(x)$lengths, seq_len))` (`lapply` is safer and faster, `seq_len` is simplified version of `seq`), approx 2 times faster. – Marek Feb 16 '11 at 12:16
  • Thanks @Marek. A couple of things are new to me: `seq_len` is faster, nice to know; and why is `lapply` safer? Also `rle` is not particularly fast; I have this nagging feeling there is a way to do this much faster using purely arithmetic operations without having to break up an array and re-assemble, etc (e.g. something involving `cumsum`). – Prasad Chalasani Feb 16 '11 at 12:51
  • 1
    `lapply` always gives you list, `sapply` sometimes not, e.g. try your code for `x <- c(0,0,1,1,0,0,1,1)`. Beside `lapply` is sufficient here so why use function based on it. – Marek Feb 16 '11 at 14:39
  • 2
    `vapply` is a safer version of `sapply` because you tell it what the output type should be – hadley Feb 16 '11 at 17:15
6

rle will "count how many consecutive hours the value has been zero since the last time it was not zero", but not in the format of your "desired output".

Note the lengths for the elements where the corresponding values are zero:

rle(x)
# Run Length Encoding
#   lengths: int [1:6] 1 1 1 3 2 2
#   values : num [1:6] 1 0 1 0 1 0
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
5

A simple base R approach:

ave(!x, cumsum(x), FUN = cumsum)

#[1] 0 1 0 1 2 3 0 0 1 2
989
  • 12,579
  • 5
  • 31
  • 53
3

One-liner, not exactly super elegant:

x <- c(1, 0, 1, 0, 0, 0, 1, 1, 0, 0) 

 unlist(lapply(split(x, c(0, cumsum(abs(diff(!x == 0))))), function(x) (x[1] == 0) * seq(length(x))))
mdsumner
  • 29,099
  • 6
  • 83
  • 91
1

Using purr::accumulate() is very straightforward, so this tidyverse solution may add some value here. I must acknowledge it is definitely not the fastest, as it calls the same function length(x)times.

library(purrr)

accumulate(x==0, ~ifelse(.y!=0, .x+1, 0))

 [1] 0 1 0 1 2 3 0 0 1 2
GuedesBF
  • 8,409
  • 5
  • 19
  • 37