How do I create an index variable based on three variables in R?

Question

I'm trying to create an index variable based on an individual identifier, a test name, and the date the test was taken in R. My data has repeated students taking the same test over and over with different scores. I'd like to be able to easily identify what number try each observation is for that specific test. My data looks something like this and I'd like to create a variable like the ID variable shown. It should start over at 1 and count, in order of date, the number of observations with the same student and test name.

student <- c(1,1,1,1,1,1,2,2,2,3,3,3,3,3)
test <-c("math","math","reading","math","reading","reading","reading","math","reading","math","math","math","reading","reading")
date <- c(1,2,3,3,4,5,2,3,5,1,2,3,4,5)
data <- data.frame(student,test,date)
print(data)
   student    test date
1        1    math    1
2        1    math    2
3        1 reading    3
4        1    math    3
5        1 reading    4
6        1 reading    5
7        2 reading    2
8        2    math    3
9        2 reading    5
10       3    math    1
11       3    math    2
12       3    math    3
13       3 reading    4
14       3 reading    5

I want to add a variable that indicates the attempt number for a test taken by the same student so it looks something like this:

       student    test date id
1        1    math    1  1
2        1    math    2  2
3        1 reading    3  1
4        1    math    3  3
5        1 reading    4  2
6        1 reading    5  3
7        2 reading    2  1
8        2    math    3  1
9        2 reading    5  2
10       3    math    1  1
11       3    math    2  2
12       3    math    3  3
13       3 reading    4  1
14       3 reading    5  2

I figured how to create an ID variable based on only one other variable, for example based on the student number, but I don't know how to do it for multiple variables. I also tried cumsum but that keeps counting with each new value, and doesn't start over at 1 when there is a new value.

tests <- transform(tests, ID = as.numeric(factor(EMPLID)))
tests$id <-cumsum(!duplicated(tests[1:3]))

Please provide enough code so others can better understand or reproduce the problem. — Community, Jan 24 '23 at 21:01

score 0 · Answer 1 · answered Jan 24 '23 at 21:35

library(dplyr)
data  %>%
  group_by(student, test) %>%
  arrange(date, .by_group = TRUE) %>%  ## make sure things are sorted by date
  mutate(id = row_number()) %>%
  ungroup()
# # A tibble: 14 × 4
#    student test     date    id
#      <dbl> <chr>   <dbl> <int>
#  1       1 math        1     1
#  2       1 math        2     2
#  3       1 math        3     3
#  4       1 reading     3     1
#  5       1 reading     4     2
#  6       1 reading     5     3
#  7       2 math        3     1
#  8       2 reading     2     1
#  9       2 reading     5     2
# 10       3 math        1     1
# 11       3 math        2     2
# 12       3 math        3     3
# 13       3 reading     4     1
# 14       3 reading     5     2

How do I create an index variable based on three variables in R?

1 Answers1