5

I just want to count the numbers of consecutive zero in last run if last run is zero for atomic vector.

For example:

a <- c(1, 0, 0, 0)

So, the number of consecutive zero in last run is 3.

If last run is not zero, then answer must be zero. For example

a <- c(0, 1, 1, 0, 0, 1)

So, answer is zero because in the last run there is one, not zero.

I do not want to use any external package. I manage to write a function that use loop. But I think more efficient method must exist.

    czero <- function(a) {
      k = 0
      for(i in 1:length(a)){
        if(a[i] == 0) {
          k = k + 1 
        } else k = 0
      }
      return(k)
    }
M--
  • 25,431
  • 8
  • 61
  • 93
Neeraj
  • 1,166
  • 9
  • 21
  • Possible duplicate of [Find consecutive sequence of zeros in R](https://stackoverflow.com/questions/15150780/find-consecutive-sequence-of-zeros-in-r) – M-- Sep 17 '19 at 19:50
  • @M-- This is not duplicate question. I checked that question. This question is general, and required not to use any external package. – Neeraj Sep 17 '19 at 19:59
  • @Neeraj there are base solutions there. p.s. I voted once, if the community thinks otherwise, your question won't get closed. Cheers. – M-- Sep 17 '19 at 20:00
  • 1
    OR https://stackoverflow.com/questions/48024347/count-of-consecutive-zeros-in-a-dataframe OR https://stackoverflow.com/questions/10643798/count-consecutive-numbers-in-a-vector OR https://stackoverflow.com/questions/19998836/r-count-consecutive-occurrences-of-values-in-a-single-column – M-- Sep 17 '19 at 20:01

4 Answers4

7

Reverse a and then compute its cumulative sum. The leading 0's will be the only 0's left and ! of that will be TRUE for each and FALSE for other elements. The sum of that is the desired number.

sum(!cumsum(rev(a)))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • This is much faster. Faster than loop. – Neeraj Sep 17 '19 at 19:39
  • 1
    If speed is of concern, `a` is long and you know that the answer is less than or equal to `k` then you could speed it up by using `tail(a, k)` in place of `a`. If you have no idea what `k` might be you can use `k <- length(a) - sum(a)` . – G. Grothendieck Sep 17 '19 at 23:46
4

The simplest improvement is to start your loop from the end of the vector and work backwards, instead of starting from the front. You can then save time by exiting the loop at the first non-zero element, instead of looping through the whole vector.

I've checked this against the given vectors, and a much longer vector with a small number of zeros at the end, to show a case where looping from the start takes a lot of time.

a <- c(1, 0, 0, 0)
b <- c(0, 1, 1, 0, 0, 1)
long <- rep(c(0, 1, 0, 1, 0), c(4, 6, 5, 10000, 3))

czero is the original function, f1 is the solution by akrun that uses rle, fczero starts the loop from the end, and revczero reverses the vector, then starts from the front.

czero <- function(a) {
  k = 0
  for(i in 1:length(a)){
    if(a[i] == 0) {
      k = k + 1 
    } else k = 0
  }
  return(k)
}

f1 <- function(vec){
  pmax(0, with(rle(vec), lengths[values == 0 &
            seq_along(values) == length(values)])[1], na.rm = TRUE)
}

fczero <- function(vec) {
  k <- 0L
  for (i in length(vec):1) {
    if (vec[i] != 0) break
    k <- k + 1L
  }
  return(k)
}

revczero <- function(vec) {
  revd <- rev(vec)
  k <- 0L
  for (i in 1:length(vec)) {
    if (revd[i] != 0) break
    k <- k + 1L
  }
  return(k)
}

Time benchmarks are below. EDIT: I've also added Grothendieck's version.

microbenchmark::microbenchmark(czero(a), f1(a), fczero(a), revczero(a), sum(!cumsum(rev(a))), times = 1000)

#  Unit: nanoseconds
#                 expr   min    lq      mean median    uq     max neval
#             czero(a)     0   514   621.035    514   515   21076  1000
#                f1(a) 21590 23133 34455.218  27245 30843 3211826  1000
#            fczero(a)     0   514   688.892    514   515   28274  1000
#          revczero(a)  2570  3085  4626.047   3599  4626  112064  1000
# sum(!cumsum(rev(a)))  2056  2571  3879.630   3085  3599   62201  1000
microbenchmark::microbenchmark(czero(b), f1(b), fczero(b), revczero(b), sum(!cumsum(rev(b))), times = 1000)

# Unit: nanoseconds
#                   expr   min    lq      mean median    uq     max neval
#             czero(b)       0   514   809.691    514   515     29815  1000
#                f1(b)   22104 23647 29372.227  24675 26217   1319583  1000
#            fczero(b)       0     0   400.502      0   514     26217  1000
#          revczero(b)    2056  2571  3844.176   3085  3599     99727  1000
# sum(!cumsum(rev(b)))    2056  2570  3592.281   3084  3598.5  107952  1000
microbenchmark::microbenchmark(czero(long), f1(long), fczero(long), revczero(long), sum(!cumsum(rev(long))), times = 1000)

# Unit: nanoseconds
#                  expr    min     lq       mean median       uq     max neval
#             czero(long) 353156 354699 422077.536 383486 443631.0 1106250  1000
#                f1(long) 112579 119775 168408.616 132627 165269.5 2068050  1000
#            fczero(long)      0    514    855.444    514   1028.0   43695  1000
#          revczero(long)  24161  27245  35890.991  29301  36498.0  149591  1000
# sum(!cumsum(rev(long)))  49350  53462  71035.486  56546    71454 2006363  1000
3

We can use rle

f1 <- function(vec){
    pmax(0, with(rle(vec), lengths[values == 0 & 
                 seq_along(values) == length(values)])[1], na.rm = TRUE)

  }

f1(a)
#[1] 3

In the second case,

b <- c(0, 1, 1, 0, 0, 1)
f1(b)
#[1] 0

Or another option is to create a function with which and cumsum

f2 <- function(vec) {
  i1 <- which(!vec)
  if(i1[length(i1)] != length(vec)) 0 else {
     sum(!cumsum(rev(c(TRUE, diff(i1) != 1)))) + 1
    }

 }

f2(a)
f2(b)
akrun
  • 874,273
  • 37
  • 540
  • 662
2

with data.table:

ifelse(last(a) == 0,
       sum(rleid(a) == last(rleid(a))),
       0)

As

> rleid(a)
[1] 1 2 2 2

It is the length of the last group, if the last value is 0

denis
  • 5,580
  • 1
  • 13
  • 40