3

I would like to find a way to do very similar to this question. Increment by 1 for every change in column

But i want to restart the counter when var1 = c using df$var2 <- with(rle(as.character(df$var1)), rep(seq_along(values), lengths))*

results in column var 2

var1 var2 Should be
   a    1   1
   a    1   1
   1    2   2
   0    3   3
   b    4   4
   b    4   4
   b    4   4
   c    5   1
   1    6   2
   1    6   2
r2evans
  • 141,215
  • 6
  • 77
  • 149
Slubee
  • 406
  • 3
  • 11

2 Answers2

2

In data.table you can use rleid to get a run-length-id for var1 within each group.

library(data.table)

setDT(df)
df[, var2 := rleid(var1), by = cumsum(var1 == "c")]
df

#    var1 var2
# 1:    a    1
# 2:    a    1
# 3:    1    2
# 4:    0    3
# 5:    b    4
# 6:    b    4
# 7:    b    4
# 8:    c    1
# 9:    1    2
#10:    1    2

and using dplyr

library(dplyr)

df %>%
  group_by(group = cumsum(var1 == "c")) %>%
  mutate(var2 = cumsum(var1 != lag(var1, default = first(var1))) + 1)

data

df <- structure(list(var1 = structure(c(3L, 3L, 2L, 1L, 4L, 4L, 4L, 
5L, 2L, 2L), .Label = c("0", "1", "a", "b", "c"), class = "factor")), 
class = "data.frame", row.names = c(NA, -10L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • using dplyr i get below error Error: Column `+...` must be length 24 (the group size) or one, not 192840 – Slubee Dec 05 '19 at 00:21
  • @Slubee - I can confirm the above code works in dplyr. Are you sure you are referencing the right variables? – thelatemail Dec 05 '19 at 00:25
  • Not sure what might be wrong then because like @thelatemail it works for me too. Here are few steps to check. 1) Does the `data.table` solution work for you? 2) Try referencing it with package name explicitly. `dplyr::mutate`. 3) Try it in fresh R session with only `dplyr` loaded. – Ronak Shah Dec 05 '19 at 00:55
0

We can use the OP's code with rle in base R with ave

df$var2 <- with(df,  as.integer(ave(as.character(var1), cumsum(var1 == 'c'), 
       FUN = function(x) with(rle(x), rep(seq_along(values), lengths)))))
df$var2
#[1] 1 1 2 3 4 4 4 1 2 2

data

df <- structure(list(var1 = structure(c(3L, 3L, 2L, 1L, 4L, 4L, 4L, 
5L, 2L, 2L), .Label = c("0", "1", "a", "b", "c"), class = "factor")), 
class = "data.frame", row.names = c(NA, 
-10L))
akrun
  • 874,273
  • 37
  • 540
  • 662