1

I'm trying to work out how to get the number of consecutive zeroes for a given column for a dataframe.

Here is a dataframe:

data <- data.frame(id = c(1,1,1,1,1,1,2,2,2,2,2,2), value = c(1,0,0,1,0,0,0,0,0,0,4,3))

This would be the desired output:

id  value consec
1   1     0
1   0     2
1   0     2
1   1     0
1   0     2
1   0     2
2   0     4
2   0     4
2   0     4
2   0     4
2   4     0
2   3     0

Any ideas on how to achieve this output? Many thanks

ekoam
  • 8,744
  • 1
  • 9
  • 22
MGJ-123
  • 614
  • 4
  • 19

4 Answers4

6

You can do:

data$consec <- with(data, ave(value, value, cumsum(value != 0), id, FUN = length) - (value != 0)) 

data
   id value consec
1   1     1      0
2   1     0      2
3   1     0      2
4   1     1      0
5   1     0      2
6   1     0      2
7   2     0      4
8   2     0      4
9   2     0      4
10  2     0      4
11  2     4      0
12  2     3      0
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
  • 2
    An variant: `with(data, ave(value==0, value, cumsum(value != 0), id, FUN = sum))` – GKi Dec 07 '20 at 12:34
3

Here's a base R solution using interaction and rle (run-length encoding):

rlid <- rle(as.numeric(interaction(data$id, data$value)))$lengths
data$consec <- replace(rep(rlid, rlid), data$value != 0, 0)

data
#>    id value consec
#> 1   1     1      0
#> 2   1     0      2
#> 3   1     0      2
#> 4   1     1      0
#> 5   1     0      2
#> 6   1     0      2
#> 7   2     0      4
#> 8   2     0      4
#> 9   2     0      4
#> 10  2     0      4
#> 11  2     4      0
#> 12  2     3      0
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
1

This dplyr solution will work. Using cumulative sum we keep track of every time a new non-zero entry occurs, and for each of these groups we count the number of zeros:

data %>% 
  group_by(id) %>% 
  mutate(flag_0 = cumsum(value == 1)) %>% 
  group_by(id, flag_0) %>% 
  mutate(conseq = ifelse(value == 0, sum(value == 0), 0)) %>% 
  ungroup()

# A tibble: 12 x 4
      id value flag_0 conseq
   <dbl> <dbl>  <int> <dbl>
 1     1     1      1     0
 2     1     0      1     2
 3     1     0      1     2
 4     1     1      2     0
 5     1     0      2     2
 6     1     0      2     2
 7     2     0      0     4
 8     2     0      0     4
 9     2     0      0     4
10     2     0      0     4
11     2     4      0     0
12     2     3      0     0
pieterbons
  • 1,604
  • 1
  • 11
  • 14
0

This tidyverse approach can also do the job

library(tidyverse)

data %>% group_by(id) %>% 
  mutate(value2 =cumsum(value)) %>% group_by(id, value, value2) %>%
  mutate(consec = ifelse(value == 0, n(), 0)) %>%
  ungroup() %>% select(-value2)

# A tibble: 12 x 3
      id value consec
   <dbl> <dbl>  <dbl>
 1     1     1      0
 2     1     0      2
 3     1     0      2
 4     1     1      0
 5     1     0      2
 6     1     0      2
 7     2     0      4
 8     2     0      4
 9     2     0      4
10     2     0      4
11     2     4      0
12     2     3      0
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • 1
    Since you're using `dplyr` already, I encourage the use of `if_else` over `ifelse` (for [several reasons](https://stackoverflow.com/q/6668963/3358272)). The only catch (which is a feature, imo) is that it requires one to think about the `class` of both return values. In this case, you are technically mixing `integer` (`n()`) with `numeric`; a more class-safe approach would use `0L` so that both returns are `integer`. (While this feature may seem inconvenient, consider `ifelse(cond, oldval, "nope")`, and realize that if oldval is numeric, base R will completely mask its conversion to chr.) – r2evans Jan 28 '21 at 12:29
  • 1
    Thanks for the tip. I actually realized this from your comment on some other question. :) – AnilGoyal Jan 28 '21 at 13:49