I am working with the R programming language. I have the following dataset that represents different students taking an exam across a period of dates - each student can either "pass" (1) or "fail" (0) the exam:
# Load the data.table package
library(data.table)
# Generate some sample data
id = sample.int(10000, 100000, replace = TRUE)
res = c(1,0)
results = sample(res, 100000, replace = TRUE)
date_exam_taken = sample(seq(as.Date('1999/01/01'), as.Date('2020/01/01'), by="day"), 100000, replace = TRUE)
# Create a data table from the sample data
my_data = data.table(id, results, date_exam_taken)
my_data <- my_data[order(id, date_exam_taken)]
# Generate some additional columns for each record
my_data$general_id = 1:nrow(my_data)
my_data$exam_number = ave(my_data$general_id, my_data$id, FUN = seq_along)
my_data$general_id = NULL
id results date_exam_taken exam_number
1: 1 0 2002-10-06 1
2: 1 1 2003-07-21 2
3: 1 1 2003-10-15 3
4: 1 0 2005-07-21 4
5: 1 1 2014-08-22 5
6: 1 1 2015-09-11 6
Grouped over students, I want to create a variable ("consec") that sums the "consecutive number of passed exams" - and this variable should go back to 0 after the first failed exam.
Based on the answer provided here (Create counter within consecutive runs of certain values), I think I can do this for the entire dataset at once:
my_data$consec = unlist(lapply(split(my_data$results, c(0, cumsum(abs(diff(!my_data$results == 1))))), function(x) (x[1] == 1) * seq(length(x))))
However, this is counting consecutive passed exams without taking into consideration individual students - each time a new student appears, I would like the "consec" variable to restart.
Can someone please show me how to do this?