1

I have the following dataset:

id = c("A","A","A","A","A","B", "B", "B", "B")
result = c(1,1,0,1,1,0,1,0,1)
my_data = data.frame(id, result)

For each unique id, I want to create a "counter variable" that:

  • if the first result value is 1 then counter = 1 , else 0
  • increases by 1 each time when result = 1
  • becomes 0 when the result = 0
  • remains 0 until the first result = 1 is encountered
  • restart to increasing by 1 each time the result = 1
  • when the next unique id is encountered, the counter initializes back to 1 if result = 1 , else 0

I think the final result should look something like this:

   id result counter
1  A      1       1
2  A      1       2
3  A      0       0
4  A      1       1
5  A      1       2
6  B      0       0
7  B      1       1
8  B      0       0
9  B      1       1

I have these two codes that I am trying to use:

# creates counter by treating entire dataset as a single ID
 my_data$counter =   unlist(lapply(split(my_data$results, c(0, cumsum(abs(diff(!my_data$results == 1))))), function(x) (x[1] == 1) * seq(length(x))))

# creates counter by taking into consideration ID's
my_data$counter = ave(my_data$results, my_data$id, FUN = function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)})

But I am not sure how to interpret these correctly. For example, I am interested in learning about how to write a general function to accomplish this task with general conditions - e.g. if result = AAA then counter restarts to 0, if result = BBB then counter + 1, if result = CCC then counter + 2, if result = DDD then counter - 1.

Can someone please show me how to do this?

Thanks!

zephryl
  • 14,633
  • 3
  • 11
  • 30
stats_noob
  • 5,401
  • 4
  • 27
  • 83

1 Answers1

2

We may create a grouping column with rleid and then do the grouping by 'id' and the rleid of 'result'

library(dplyr)
library(data.table)
my_data %>% 
  group_by(id) %>%
  mutate(grp = rleid(result)) %>% 
  group_by(grp, .add = TRUE) %>%
  mutate(counter = row_number() * result)%>%
  ungroup %>% 
  select(-grp)

-output

# A tibble: 9 × 3
  id    result counter
  <chr>  <dbl>   <dbl>
1 A          1       1
2 A          1       2
3 A          0       0
4 A          1       1
5 A          1       2
6 B          0       0
7 B          1       1
8 B          0       0
9 B          1       1

Or using data.table

library(data.table)
setDT(my_data)[, counter := seq_len(.N) * result, .(id, rleid(result))]

-output

> my_data
   id result counter
1:  A      1       1
2:  A      1       2
3:  A      0       0
4:  A      1       1
5:  A      1       2
6:  B      0       0
7:  B      1       1
8:  B      0       0
9:  B      1       1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @ Akrun: Thank you so much for your answer! I tried a similar approach here - can you please take a look at it if you have time? https://stackoverflow.com/questions/75203442/r-creating-a-counter-function-with-dplyr – stats_noob Jan 22 '23 at 19:52