Create counter within consecutive runs of certain values

Question

I have an hourly value. I want to count how many consecutive hours the value has been zero since the last time it was not zero. This is an easy job for a spreadsheet or for loop, but I am hoping for a snappy vectorized one-liner to accomplish the task.

x <- c(1, 0, 1, 0, 0, 0, 1, 1, 0, 0)
df <- data.frame(x, zcount = NA)

df$zcount[1] <- ifelse(df$x[1] == 0, 1, 0)
for(i in 2:nrow(df)) 
  df$zcount[i] <- ifelse(df$x[i] == 0, df$zcount[i - 1] + 1, 0)

Desired output:

R> df
   x zcount
1  1      0
2  0      1
3  1      0
4  0      1
5  0      2
6  0      3
7  1      0
8  1      0
9  0      1
10 0      2

Martin Morgan · Answer 1 · 2011-02-16T17:21:58.653

William Dunlap's posts on R-help are the place to look for all things related to run lengths. His f7 from this post is

f7 <- function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)}

and in the current situation f7(!x). In terms of performance there is

> x <- sample(0:1, 1000000, TRUE)
> system.time(res7 <- f7(!x))
   user  system elapsed 
  0.076   0.000   0.077 
> system.time(res0 <- cumul_zeros(x))
   user  system elapsed 
  0.345   0.003   0.349 
> identical(res7, res0)
[1] TRUE

Prasad Chalasani · Accepted Answer · 2011-02-16T15:13:54.927

23

Here's a way, building on Joshua's rle approach: (EDITED to use seq_len and lapply as per Marek's suggestion)

> (!x) * unlist(lapply(rle(x)$lengths, seq_len))
 [1] 0 1 0 1 2 3 0 0 1 2

UPDATE. Just for kicks, here's another way to do it, around 5 times faster:

cumul_zeros <- function(x)  {
  x <- !x
  rl <- rle(x)
  len <- rl$lengths
  v <- rl$values
  cumLen <- cumsum(len)
  z <- x
  # replace the 0 at the end of each zero-block in z by the 
  # negative of the length of the preceding 1-block....
  iDrops <- c(0, diff(v)) < 0
  z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
  # ... to ensure that the cumsum below does the right thing.
  # We zap the cumsum with x so only the cumsums for the 1-blocks survive:
  x*cumsum(z)
}

Try an example:

> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))
 [1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0

Now compare times on a million-length vector:

> x <- sample(0:1, 1000000,T)
> system.time( z <- cumul_zeros(x))
   user  system elapsed 
   0.15    0.00    0.14 
> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))
   user  system elapsed 
   0.75    0.00    0.75

Moral of the story: one-liners are nicer and easier to understand, but not always the fastest!

edited Feb 16 '11 at 15:13

answered Feb 16 '11 at 11:53

Prasad Chalasani

19,912
7
51
73

2

+1 brilliant one-liner. Little code profiling: `(!x) * unlist(lapply(rle(x)$lengths, seq_len))` (`lapply` is safer and faster, `seq_len` is simplified version of `seq`), approx 2 times faster. – Marek Feb 16 '11 at 12:16
Thanks @Marek. A couple of things are new to me: `seq_len` is faster, nice to know; and why is `lapply` safer? Also `rle` is not particularly fast; I have this nagging feeling there is a way to do this much faster using purely arithmetic operations without having to break up an array and re-assemble, etc (e.g. something involving `cumsum`). – Prasad Chalasani Feb 16 '11 at 12:51
1

`lapply` always gives you list, `sapply` sometimes not, e.g. try your code for `x <- c(0,0,1,1,0,0,1,1)`. Beside `lapply` is sufficient here so why use function based on it. – Marek Feb 16 '11 at 14:39
2

`vapply` is a safer version of `sapply` because you tell it what the output type should be – hadley Feb 16 '11 at 17:15

score 6 · Answer 3 · answered Feb 16 '11 at 04:33

6

rle will "count how many consecutive hours the value has been zero since the last time it was not zero", but not in the format of your "desired output".

Note the lengths for the elements where the corresponding values are zero:

rle(x)
# Run Length Encoding
#   lengths: int [1:6] 1 1 1 3 2 2
#   values : num [1:6] 1 0 1 0 1 0

answered Feb 16 '11 at 04:33

Joshua Ulrich

173,410
32
338
418

1

Handy but I can't get what I need from rle without doing something pretty indecent. – J. Win. Feb 16 '11 at 05:53

score 5 · Answer 4 · answered Sep 20 '18 at 21:32

5

A simple base R approach:

ave(!x, cumsum(x), FUN = cumsum)

#[1] 0 1 0 1 2 3 0 0 1 2

answered Sep 20 '18 at 21:32

989

12,579
5
31
53

mdsumner · Answer 5 · 2011-02-16T05:32:12.090

3

One-liner, not exactly super elegant:

x <- c(1, 0, 1, 0, 0, 0, 1, 1, 0, 0) 

 unlist(lapply(split(x, c(0, cumsum(abs(diff(!x == 0))))), function(x) (x[1] == 0) * seq(length(x))))

edited Feb 16 '11 at 05:32

answered Feb 16 '11 at 05:02

mdsumner

29,099
6
83
91

GuedesBF · Answer 6 · 2021-07-06T21:54:24.787

1

Using purr::accumulate() is very straightforward, so this tidyverse solution may add some value here. I must acknowledge it is definitely not the fastest, as it calls the same function length(x)times.

library(purrr)

accumulate(x==0, ~ifelse(.y!=0, .x+1, 0))

 [1] 0 1 0 1 2 3 0 0 1 2

edited Jul 06 '21 at 21:54

answered Jul 06 '21 at 21:45

GuedesBF

8,409
5
19
37

Create counter within consecutive runs of certain values

6 Answers6

Linked

Related