1

I have a flag column that contains continuous streams 1s and 0s. I want to add the stream of 1s. When it encounters 0s, the summing should stop. For the next stream of 1s, summing should start afresh

I have tried cumsum(negread_flag == 1) this continues to sum after the 0s

negread_flag   result
1               1
1               2 
1               3  
1               4 
0               0 
0               0
0               0
1               1
1               2
1               3
0               0
  • 1
    @Henrik The dupe link doesn't give me the expected output `with(df1, (!negread_flag) * unlist(lapply(rle(negread_flag)$lengths, seq_len)))#[1] 0 0 0 0 1 2 3 0 0 0 1` – akrun Apr 17 '19 at 11:56

1 Answers1

2

We can make use of rleid (run-length-id - to generate different ids when the adjacent element differ) as a grouping variable, then get the sequence of the group and assign it to 'result' where 'negread_flag' is 1, remove the 'grp' column by assigning it to NULL

library(data.table)
setDT(df1)[, grp := rleid(negread_flag)
     ][, result := 0
     ][negread_flag == 1, 
      result := seq_len(.N), grp][, grp := NULL][]
#     negread_flag result
# 1:            1      1
# 2:            1      2
# 3:            1      3
# 4:            1      4
# 5:            0      0
# 6:            0      0
# 7:            0      0
# 8:            1      1
# 9:            1      2
#10:            1      3
#11:            0      0

Or a similar idea with tidyverse, using the rleid (from data.table), create the 'result' by multiplying the row_number() with the 'negread_flag' so that values corresponding to 0 in 'negread_flag' becomes 0

library(tidyverse)
df1 %>%
   group_by(grp = rleid(negread_flag)) %>%
   mutate(result = row_number() * negread_flag) %>% 
   ungroup %>% 
   select(-grp)
# A tibble: 11 x 2
#   negread_flag result
#          <int>  <int>
# 1            1      1
# 2            1      2
# 3            1      3
# 4            1      4
# 5            0      0
# 6            0      0
# 7            0      0
# 8            1      1
# 9            1      2
#10            1      3
#11            0      0

Or using base R

i1 <- df1$negread_flag != 0
df1$result[i1] <- with(rle(df1$negread_flag), sequence(lengths * values))

Or as @markus commented

df1$result[i1] <- sequence(rle(df1$negread_flag)$lengths) * df1$negread_flag

data

df1 <- structure(list(negread_flag = c(1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 
  1L, 1L, 0L)), row.names = c(NA, -11L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    @ akrun: great answer! I used the code provided in this answer today and it exactly solved my problem! thank you so much! – stats_noob Dec 24 '22 at 05:40