-1

I was searching for an answer to my specific problem, but I didn't find a conclusion.

I have a dataframe with data

ID  a
1   0
2   0
3   1
4   1
5   1
6   1
7   0
8   1
9   1
10  0
11  1
12  0
13  0

Now i want to add "b" column with number increase from previous b if a == 1

Result like this

ID  a   b
1   0   0
2   0   0
3   1   1
4   1   2
5   1   3
6   1   4
7   0   0
8   1   1
9   1   2
10  0   0
11  1   1
12  0   0
13  0   0
14  1   1
15  1   2
16  1   3
17  1   4

Thanks in advance!

Quý
  • 1
  • 2

3 Answers3

1

Here is one approach that uses rleid() from data.table to create a grouping variable that we use inside ave(). We then calculate the cumsum per group which will be 0 whenever a == 0.

library(data.table)
df$new_b <- with(df, ave(a, rleid(a), FUN = cumsum))
df
#   ID a b new_b
#1   1 0 0     0
#2   2 0 0     0
#3   3 1 1     1
#4   4 1 2     2
#5   5 1 3     3
#6   6 1 4     4
#7   7 0 0     0
#8   8 1 1     1
#9   9 1 2     2
#10 10 0 0     0
#11 11 1 1     1
#12 12 0 0     0
#13 13 0 0     0
#14 14 1 1     1
#15 15 1 2     2
#16 16 1 3     3
#17 17 1 4     4

Once data.table is loaded you could also do

setDT(df)[, new_b := cumsum(a), rleid(a)][]

data

df <- structure(list(ID = 1:17, a = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 
1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L), b = c(0L, 0L, 1L, 2L, 3L, 
4L, 0L, 1L, 2L, 0L, 1L, 0L, 0L, 1L, 2L, 3L, 4L)), .Names = c("ID", 
"a", "b"), class = "data.frame", row.names = c(NA, -17L))
markus
  • 25,843
  • 5
  • 39
  • 58
1

How about the following using base R's rle

df$b <- unlist(mapply(
    function(len, val) if (val == 0) rep(0, len) else 1:len,
    rle(df$a)$lengths, rle(df$a)$values));
df;
#   ID a b
#1   1 0 0
#2   2 0 0
#3   3 1 1
#4   4 1 2
#5   5 1 3
#6   6 1 4
#7   7 0 0
#8   8 1 1
#9   9 1 2
#10 10 0 0
#11 11 1 1
#12 12 0 0
#13 13 0 0

Sample data

df <- read.table(text =
    "ID  a
1   0
2   0
3   1
4   1
5   1
6   1
7   0
8   1
9   1
10  0
11  1
12  0
13  0", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
1

Using dplyr an option can be to group on cumsum(a==0). This will create a group which got a previous row (if available with a=0 for all rows with a=1. Now, lag(cumsum(a==1)) will provide expected count.

library(dplyr)

df %>% group_by(grp = cumsum(a==0)) %>%
  mutate(b = ifelse(a==1, lag(cumsum(a==1))+1,0)) %>% 
  ungroup() %>% 
  select(-grp) %>%
  as.data.frame()

#    ID a b
# 1   1 0 0
# 2   2 0 0
# 3   3 1 1
# 4   4 1 2
# 5   5 1 3
# 6   6 1 4
# 7   7 0 0
# 8   8 1 1
# 9   9 1 2
# 10 10 0 0
# 11 11 1 1
# 12 12 0 0
# 13 13 0 0

Data:

df <- read.table(text="
ID  a
1   0
2   0
3   1
4   1
5   1
6   1
7   0
8   1
9   1
10  0
11  1
12  0
13  0",
header = TRUE, stringsAsFactors = FALSE)
MKR
  • 19,739
  • 4
  • 23
  • 33