8

I'm trying to calculate a sequence of win streaks for a binary vector. Given a vector

set.seed(2)
x <- sample(c(0,1), 10, replace = TRUE)
[1] 0 1 1 0 1 1 0 1 0 1

I want to calculate the cumulative sum of ones with a "reset" every time there's a zero. So, in this case, the output of the function should be

[1] 0 1 2 0 1 2 0 1 0 1

What's the easiest way to do this on R?

user3294195
  • 1,748
  • 1
  • 19
  • 36

3 Answers3

18

We can use ave and create a grouping variable with cumsum at every occurrence of 0 in the vector and count the consecutive numbers without 0 in each group.

ave(x, cumsum(x==0), FUN = seq_along) - 1
#[1] 0 1 2 0 1 2 0 1 0 1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    This will be plenty fast for almost any occasion, but if someone is looking for an Rcpp option for this query - see here https://stackoverflow.com/a/51054267/496803 – thelatemail Aug 13 '18 at 01:55
  • 1
    This doesn't work if the sequence starts with a success (e.g., `x <- c(1, 1, 1, 0, 1, 1, 0, 1, 0, 1)`). I think this should work: `c(ave(c(0, x), cumsum(c(0, x) == 0), FUN = seq_along) - 1)[-1]` – Nat May 15 '20 at 19:30
2

We can use rleid with rowid

library(data.table)
rowid(rleid(x)) * x
#[1] 0 1 2 0 1 2 0 1 0 1

data

x <- c(0, 1, 1, 0, 1, 1, 0, 1, 0, 1)
akrun
  • 874,273
  • 37
  • 540
  • 662
0

I recommend runner package and function streak_run which calculates consecutive occurences. Possible also calculating on sliding windows (eg. last 5 observations), more in github documentation

library(runner)
streak <- streak_run(x)
streak[x == 0] <- 0
print(streak)
# [1] 0 1 2 0 1 2 0 1 0 1

Compare speed with other solutions

fun_ave <- function (x) ave(x, cumsum(x==0), FUN = seq_along) - 1
fun_dt  <- function (x) rowid(rleid(x)) * x
run <- function(x) {
  out <- streak_run(x)
  out[x == 0] <- 0
  out
}


microbenchmark::microbenchmark(
  run,
  fun_ave(x),
  fun_dt(x),
  times = 1000L
)

# Unit: nanoseconds
#        expr    min       lq       mean   median       uq     max neval
#         run     48     58.5    197.676    207.5    250.0   12599  1000
#  fun_ave(x) 122984 137144.0 173577.501 146211.5 193241.5 3243640  1000
#   fun_dt(x)  24954  28959.0  42959.954  36262.5  40843.0 4141624  1000
GoGonzo
  • 2,637
  • 1
  • 18
  • 25