In R, remove all instances of a row with exactly n duplicates

Question

I have a data frame in R with student grade data. Here's some sample data to show how it's structured:

student.data <- data.frame(
                 "CourseNumber" = c(101, 101, 101, 102, 102, 102, 103, 103, 104, 104, 104, 105, 106, 106, 106),
                 "TermID" = c("T1", "E1", "S1", "T1", "E1", "S1", "T1", "S1", "T1", "E1", "S1", "S1", "T1", "E1", "S1"),
                 "StudentID" = c("9100", "9100", "9100", "9100", "9100", "9100", "9100", "9100", "9100", "9100", "9100", "8640", "8640", "8640", "8640"),
                 "Grade" = c(92, 80, 91, 83, 87, 84, 79, 79, 79, 85, 81, 98, 93, 97, 94))

student.data

##    CourseNumber TermID StudentID Grade
## 1           101     T1      9100    92
## 2           101     E1      9100    80
## 3           101     S1      9100    91

## 4           102     T1      9100    83
## 5           102     E1      9100    87
## 6           102     S1      9100    84

## 7           103     T1      9100    79
## 8           103     S1      9100    79

## 9           104     T1      9100    78
## 10          104     E1      9100    85
## 11          104     S1      9100    81

## 12          105     S1      8640    98

## 13          106     T1      8640    93
## 14          106     E1      8640    97
## 15          106     S1      8640    94

A single student will usually (but not always) receive 3 grades in a single course: S1, E1, and S1. That's why I've added extra breaks in the printout--each grouping of 3 represents a student's grade in a single course. T1 is his pre-exam score, E1 is his final exam score, and S1 is his post-exam score. The 3 are supposed to related by this formula S1 = (80%)(T1) + (20%)(E1), and I'm finding instances where a higher S1 grade was awarded.

My problem is that I need to remove rows 7, 8, and 12 because those courses did not award all 3 grades (T1, E1, S1). This will help, because I already have some code that successfully calculates the amount of "bump" the teacher gave, but it throws an error every time it reaches a student-course combination that doesn't have all 3 grades (T1, E1, and S1).

So, my question is: how would I remove all rows for which there are not exactly 3 duplicates, as determined by the variables CourseNumber and StudentID? (A solution would work if it removes all rows for which there are fewer than 3 duplicates, or even if it just removed all rows for which there were exactly 2 duplicates.)

I found this clever answer and tried the code below, but it only removes rows for which there is no duplicate at all. So, it removes row 12 but doesn't also remove rows 7 and 8.

temp <- student.data[ , c("CourseNumber","StudentID") ]
student.data <- student.data[duplicated(temp) | duplicated(temp, fromLast = TRUE), ]

student.data

##    CourseNumber TermID StudentID Grade
## 1           101     T1      9100    92
## 2           101     E1      9100    80
## 3           101     S1      9100    91

## 4           102     T1      9100    83
## 5           102     E1      9100    87
## 6           102     S1      9100    84

## 7           103     T1      9100    79
## 8           103     S1      9100    79

## 9           104     T1      9100    78
## 10          104     E1      9100    85
## 11          104     S1      9100    81

## 13          106     T1      8640    93
## 14          106     E1      8640    97
## 15          106     S1      8640    94

Here's the output I'm trying to achieve:

##    CourseNumber TermID StudentID Grade
## 1           101     T1      9100    92
## 2           101     E1      9100    80
## 3           101     S1      9100    91

## 4           102     T1      9100    83
## 5           102     E1      9100    87
## 6           102     S1      9100    84

## 9           104     T1      9100    78
## 10          104     E1      9100    85
## 11          104     S1      9100    81

## 13          106     T1      8640    93
## 14          106     E1      8640    97
## 15          106     S1      8640    94

akrun · Accepted Answer · 2021-04-24T19:56:58.717

Based on the description, the OP may need to return group that have 3 rows

library(dplyr)
student.data %>% 
   group_by(CourseNumber, StudentID) %>%
   filter(n() == 3) %>%
   ungroup

-output

# A tibble: 12 x 4
#   CourseNumber TermID StudentID Grade
#          <dbl> <chr>  <chr>     <dbl>
# 1          101 T1     9100         92
# 2          101 E1     9100         80
# 3          101 S1     9100         91
# 4          102 T1     9100         83
# 5          102 E1     9100         87
# 6          102 S1     9100         84
# 7          104 T1     9100         79
# 8          104 E1     9100         85
# 9          104 S1     9100         81
#10          106 T1     8640         93
#11          106 E1     8640         97
#12          106 S1     8640         94

Thanks!! I just added my expected output. This is exactly what I was looking for, including filter(n() == 3). Seeing the other examples you had shown were also really helpful. I'll accept this answer once the waiting period is complete! — jdcode, Apr 24 '21 at 19:56

In R, remove all instances of a row with exactly n duplicates

1 Answers1