I have a data frame in R with student grade data. Here's some sample data to show how it's structured:
student.data <- data.frame(
"CourseNumber" = c(101, 101, 101, 102, 102, 102, 103, 103, 104, 104, 104, 105, 106, 106, 106),
"TermID" = c("T1", "E1", "S1", "T1", "E1", "S1", "T1", "S1", "T1", "E1", "S1", "S1", "T1", "E1", "S1"),
"StudentID" = c("9100", "9100", "9100", "9100", "9100", "9100", "9100", "9100", "9100", "9100", "9100", "8640", "8640", "8640", "8640"),
"Grade" = c(92, 80, 91, 83, 87, 84, 79, 79, 79, 85, 81, 98, 93, 97, 94))
student.data
## CourseNumber TermID StudentID Grade
## 1 101 T1 9100 92
## 2 101 E1 9100 80
## 3 101 S1 9100 91
## 4 102 T1 9100 83
## 5 102 E1 9100 87
## 6 102 S1 9100 84
## 7 103 T1 9100 79
## 8 103 S1 9100 79
## 9 104 T1 9100 78
## 10 104 E1 9100 85
## 11 104 S1 9100 81
## 12 105 S1 8640 98
## 13 106 T1 8640 93
## 14 106 E1 8640 97
## 15 106 S1 8640 94
A single student will usually (but not always) receive 3 grades in a single course: S1, E1, and S1. That's why I've added extra breaks in the printout--each grouping of 3 represents a student's grade in a single course. T1 is his pre-exam score, E1 is his final exam score, and S1 is his post-exam score. The 3 are supposed to related by this formula S1 = (80%)(T1) + (20%)(E1)
, and I'm finding instances where a higher S1 grade was awarded.
My problem is that I need to remove rows 7, 8, and 12 because those courses did not award all 3 grades (T1, E1, S1). This will help, because I already have some code that successfully calculates the amount of "bump" the teacher gave, but it throws an error every time it reaches a student-course combination that doesn't have all 3 grades (T1, E1, and S1).
So, my question is: how would I remove all rows for which there are not exactly 3 duplicates, as determined by the variables CourseNumber and StudentID? (A solution would work if it removes all rows for which there are fewer than 3 duplicates, or even if it just removed all rows for which there were exactly 2 duplicates.)
I found this clever answer and tried the code below, but it only removes rows for which there is no duplicate at all. So, it removes row 12 but doesn't also remove rows 7 and 8.
temp <- student.data[ , c("CourseNumber","StudentID") ]
student.data <- student.data[duplicated(temp) | duplicated(temp, fromLast = TRUE), ]
student.data
## CourseNumber TermID StudentID Grade
## 1 101 T1 9100 92
## 2 101 E1 9100 80
## 3 101 S1 9100 91
## 4 102 T1 9100 83
## 5 102 E1 9100 87
## 6 102 S1 9100 84
## 7 103 T1 9100 79
## 8 103 S1 9100 79
## 9 104 T1 9100 78
## 10 104 E1 9100 85
## 11 104 S1 9100 81
## 13 106 T1 8640 93
## 14 106 E1 8640 97
## 15 106 S1 8640 94
Here's the output I'm trying to achieve:
## CourseNumber TermID StudentID Grade
## 1 101 T1 9100 92
## 2 101 E1 9100 80
## 3 101 S1 9100 91
## 4 102 T1 9100 83
## 5 102 E1 9100 87
## 6 102 S1 9100 84
## 9 104 T1 9100 78
## 10 104 E1 9100 85
## 11 104 S1 9100 81
## 13 106 T1 8640 93
## 14 106 E1 8640 97
## 15 106 S1 8640 94