I'm trying to create an index variable based on an individual identifier, a test name, and the date the test was taken in R. My data has repeated students taking the same test over and over with different scores. I'd like to be able to easily identify what number try each observation is for that specific test. My data looks something like this and I'd like to create a variable like the ID variable shown. It should start over at 1 and count, in order of date, the number of observations with the same student and test name.
student <- c(1,1,1,1,1,1,2,2,2,3,3,3,3,3)
test <-c("math","math","reading","math","reading","reading","reading","math","reading","math","math","math","reading","reading")
date <- c(1,2,3,3,4,5,2,3,5,1,2,3,4,5)
data <- data.frame(student,test,date)
print(data)
student test date
1 1 math 1
2 1 math 2
3 1 reading 3
4 1 math 3
5 1 reading 4
6 1 reading 5
7 2 reading 2
8 2 math 3
9 2 reading 5
10 3 math 1
11 3 math 2
12 3 math 3
13 3 reading 4
14 3 reading 5
I want to add a variable that indicates the attempt number for a test taken by the same student so it looks something like this:
student test date id
1 1 math 1 1
2 1 math 2 2
3 1 reading 3 1
4 1 math 3 3
5 1 reading 4 2
6 1 reading 5 3
7 2 reading 2 1
8 2 math 3 1
9 2 reading 5 2
10 3 math 1 1
11 3 math 2 2
12 3 math 3 3
13 3 reading 4 1
14 3 reading 5 2
I figured how to create an ID variable based on only one other variable, for example based on the student number, but I don't know how to do it for multiple variables. I also tried cumsum but that keeps counting with each new value, and doesn't start over at 1 when there is a new value.
tests <- transform(tests, ID = as.numeric(factor(EMPLID)))
tests$id <-cumsum(!duplicated(tests[1:3]))