0

I encountered the question: "Cumulative sum that resets when 0 is encountered" via https://stackoverflow.com/a/32502162/13269143 , which partially, but not fully, answered my question. I first wanted to create a column that, row-wise, accumulates the values of each sequence in column b that is separated by a 0. This I achieved by using the code:

setDT(df)[, whatiwant := cumsum(b), by = rleid(b == 0L)]

as suggested in https://stackoverflow.com/a/32502162/13269143 (the other solutions provided did not work for me. They only produced NA values.) Now, I wish to also create a third column, "What I Want" in the illustration, that assigns the maximum aggregated value of the accumulated value for a given sequence to each observation in that particular sequence. Let me illustrate,

b     Accumulated   What I Want
1      1            3
1      2            3
1      3            3
0      0            0
1      1            4
1      2            4
1      3            4
1      4            4
0      0            0
0      0            0
0      0            0
1      1            2
1      2            2

There might be a very simple way to do this. Thank you in advance.

Cec SK
  • 59
  • 5

3 Answers3

1

You can use rle and inverse.rle like:

b <- c(1,1,1,0,1,1,1,1,0,0,0,1,1)

x <- rle(b)
i <- x$values == 1
x$values[i] <- x$lengths[i]
inverse.rle(x)
# [1] 3 3 3 0 4 4 4 4 0 0 0 2 2
GKi
  • 37,245
  • 2
  • 26
  • 48
1

You can use max instead of cumsum in your attempt :

library(data.table)
setDT(df)[, whatiwant := max(Accumulated), by = rleid(b == 0L)]
df

#    b Accumulated whatiwant
# 1: 1           1         3
# 2: 1           2         3
# 3: 1           3         3
# 4: 0           0         0
# 5: 1           1         4
# 6: 1           2         4
# 7: 1           3         4
# 8: 1           4         4
# 9: 0           0         0
#10: 0           0         0
#11: 0           0         0
#12: 1           1         2
#13: 1           2         2
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

You can use the rle() function to get the run lengths and then mapply() to turn its return value into the vector you want:

d <- tibble(b=c(1,1,1,0,1,1,1,1,0,0,0,1,1),
            WhatIWant=unlist(mapply(rep, rle(b)$lengths, rle(b)$lengths))) %>% 
    mutate(WhatIWant=ifelse(b == 0, 0, WhatIWant))

Gives

# A tibble: 13 x 2
       b WhatIWant
   <dbl>     <dbl>
 1     1         3
 2     1         3
 3     1         3
 4     0         0
 5     1         4
 6     1         4
 7     1         4
 8     1         4
 9     0         0
10     0         0
11     0         0
12     1         2
13     1         2
Limey
  • 10,234
  • 2
  • 12
  • 32