count of groups based on the runs of a variable

Question

These are fictitious data, in reality they are more complicated

df.1 <- data.frame (value = c(10, 12, 15, 14, 13, 0, 0, 14, 7, 13, 0, 14, 14, 0, 0, 0, 0, 4, 2, 10, 12 ))

I wanted to count numbers other than zero and zero. I found the answer of @allan-cameron that is mostly helpful.

Count and summation of positive and negative number sequences

count_and_sum <- function(x){
  runs   <- rle((x > 0) * 1)$lengths
  groups <- split(x, rep(1:length(runs), runs))
  output <- function(group) data.frame(x = group, n.obs = seq_along(group), suma = cumsum(group))
  result <- as.data.frame(do.call(rbind, lapply(groups, output)))
  `rownames<-`(result, 1:nrow(result))
}

I get results as expected

> df.1
   value  x n.obs suma
1     10 10     1   10
2     12 12     2   22
3     15 15     3   37
4     14 14     4   51
5     13 13     5   64
6      0  0     1    0
7      0  0     2    0
8     14 14     1   14
9      7  7     2   21
10    13 13     3   34
11     0  0     1    0
12    14 14     1   14
13    14 14     2   28
14     0  0     1    0
15     0  0     2    0
16     0  0     3    0
17     0  0     4    0
18     4  4     1    4
19     2  2     2    6
20    10 10     3   16
21    12 12     4   28

I don't know how to change the function to give "group number" as well. Expected results:

> df.1
   value  x n.obs suma nr.group
1     10 10     1   10        1
2     12 12     2   22        1
3     15 15     3   37        1
4     14 14     4   51        1
5     13 13     5   64        1
6      0  0     1    0        2
7      0  0     2    0        2
8     14 14     1   14        3
9      7  7     2   21        3
10    13 13     3   34        3
11     0  0     1    0        4
12    14 14     1   14        5
13    14 14     2   28        5
14     0  0     1    0        6
15     0  0     2    0        6
16     0  0     3    0        6
17     0  0     4    0        6
18     4  4     1    4        7
19     2  2     2    6        7
20    10 10     3   16        7
21    12 12     4   28        7

Andre Wildberg · Accepted Answer · 2023-04-23T11:59:45.390

Something like this, using consecutive_id from dplyr > 1.1.0

library(dplyr)

df.1 %>% 
  group_by(nr.group = consecutive_id(value != 0)) %>% 
  mutate(n.obs = row_number(), suma = cumsum(value)) %>% 
  ungroup()
# A tibble: 21 × 4
   value nr.group n.obs  suma
   <dbl>    <int> <dbl> <dbl>
 1    10        1     1    10
 2    12        1     2    22
 3    15        1     3    37
 4    14        1     4    51
 5    13        1     5    64
 6     0        2     1     0
 7     0        2     2     0
 8    14        3     1    14
 9     7        3     2    21
10    13        3     3    34
# … with 11 more rows
# ℹ Use `print(n = ...)` to see more rows

count of groups based on the runs of a variable

1 Answers1