Count the number of NA values in a row - reset when 0

Question

I encountered the question: "Cumulative sum that resets when 0 is encountered" via https://stackoverflow.com/a/32502162/13269143 , which partially, but not fully, answered my question. I first wanted to create a column that, row-wise, accumulates the values of each sequence in column b that is separated by a 0. This I achieved by using the code:

setDT(df)[, whatiwant := cumsum(b), by = rleid(b == 0L)]

as suggested in https://stackoverflow.com/a/32502162/13269143 (the other solutions provided did not work for me. They only produced NA values.) Now, I wish to also create a third column, "What I Want" in the illustration, that assigns the maximum aggregated value of the accumulated value for a given sequence to each observation in that particular sequence. Let me illustrate,

b     Accumulated   What I Want
1      1            3
1      2            3
1      3            3
0      0            0
1      1            4
1      2            4
1      3            4
1      4            4
0      0            0
0      0            0
0      0            0
1      1            2
1      2            2

There might be a very simple way to do this. Thank you in advance.

score 1 · Answer 1 · answered May 20 '20 at 11:56

1

You can use rle and inverse.rle like:

b <- c(1,1,1,0,1,1,1,1,0,0,0,1,1)

x <- rle(b)
i <- x$values == 1
x$values[i] <- x$lengths[i]
inverse.rle(x)
# [1] 3 3 3 0 4 4 4 4 0 0 0 2 2

answered May 20 '20 at 11:56

GKi

37,245
2
26
48

score 1 · Accepted Answer · answered May 20 '20 at 11:57

You can use max instead of cumsum in your attempt :

library(data.table)
setDT(df)[, whatiwant := max(Accumulated), by = rleid(b == 0L)]
df

#    b Accumulated whatiwant
# 1: 1           1         3
# 2: 1           2         3
# 3: 1           3         3
# 4: 0           0         0
# 5: 1           1         4
# 6: 1           2         4
# 7: 1           3         4
# 8: 1           4         4
# 9: 0           0         0
#10: 0           0         0
#11: 0           0         0
#12: 1           1         2
#13: 1           2         2

score 0 · Answer 3 · answered May 20 '20 at 12:15

You can use the rle() function to get the run lengths and then mapply() to turn its return value into the vector you want:

d <- tibble(b=c(1,1,1,0,1,1,1,1,0,0,0,1,1),
            WhatIWant=unlist(mapply(rep, rle(b)$lengths, rle(b)$lengths))) %>% 
    mutate(WhatIWant=ifelse(b == 0, 0, WhatIWant))

Gives

# A tibble: 13 x 2
       b WhatIWant
   <dbl>     <dbl>
 1     1         3
 2     1         3
 3     1         3
 4     0         0
 5     1         4
 6     1         4
 7     1         4
 8     1         4
 9     0         0
10     0         0
11     0         0
12     1         2
13     1         2

Three simultaneous answers! Take your pick. :) – Limey May 20 '20 at 12:16 — Limey, May 20 '20 at 12:16

Count the number of NA values in a row - reset when 0

3 Answers3