I have data that includes student scores in their
mid-terms
and theirfinal
exams.This data is arranged in a
wide
format, where each row corresponds to a unique student ID shown asSUID
.My data also includes information about teachers, which is shown by
TUserId
. Each teacher can have multiple students, hence there is a common teacher ID across several rows.I am interested in finding out if there are any instances where a teacher who gave similar scores to their students on mid-terms (as shown by
mid_sum
) gave inconsistent scores to their students on their final exams (as shown byfinal_sum
). To record such inconsistency, I want to add a columnStatus
that records this mismatch or inconsistency.
Input:
My data df
looks something like this: :
TUserId SUID mid_sum final_sum
115 201 7 1
115 309 8 2
115 404 9 1
209 245 10 2
209 398 10 2
209 510 10 2
209 602 10 1
371 111 11 2
371 115 11 2
371 123 11 3
371 124 11 2
Output:
For my output, I require something like this: :
TUserId SUID mid_sum final_sum Status
115 201 7 1 consistent
115 309 8 2 consistent
115 404 9 1 inconsistent
209 245 10 2 consistent
209 398 10 2 consistent
209 510 10 2 consistent
209 602 10 1 inconsistent
371 111 11 2 consistent
371 115 11 2 consistent
371 123 11 3 inconsistent
371 124 11 2 consistent
Requirements:
My requirements are as below:
Mid Term Score Requirements:
1- Students cannot have higher final grades (relative to each other) when their midterm scores were lower. For example, student SUID = 309
has higher final grade when their mid-term score was lower than the student SUID = 404
. In such a case I want to flag SUID = 404
as inconsistent
.
2- Students with similar mid-term scores also cannot have different final grades. For example, student SUID = 602
has lower final grade when their mid-term score was equal to the other students of the teacher TUserId = 209
. Similarly, the student SUID = 123
has higher final grade when their mid-term score was equal to the other students of the teacher TUserId = 371
.
Final Score Requirements:
1- However, the same final
score can be assigned to students with different mid-term scores. I realise that this requirement is a bit confusing. As long as the mid-term scores stay the same or they increase, the final grades can stay the same. But not the other way around i.e., if the mid-term score starts decreasing within that teacher then the final grade cannot stay the same.
2- Moreover, if the mid-term scores are increasing, the final scores can also increase (or stay the same as the previous value).
Data import dput()
The dput()
for the data frame is below:
dput(df)
structure(list(
TUserId = c(115L, 115L, 115L, 209L, 209L, 209L, 209L, 371L, 371L, 371L, 371L),
SUID = c(201L, 309L, 404L, 245L, 398L, 510L, 602L, 111L, 115L, 123L, 124L),
mid_sum = c(7L, 8L, 9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L),
final_sum = c(1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 2L)),
class = "data.frame", row.names = c(NA, -11L))
Note:
I have students'
mid_sum
andfinal_sum
scores sorted in ascending order. I just want to identify the cases where there is an inconsistent assignment of marks.From an implementation perspective, comparisons are always made to the previous value.
I am re-posting my question as the last example did not clarify my exact requirements Identify cases where data sequence changes based on other column UserIDs.
Partial Solution:
The following solution is partially meeting my requirements but does not capture the cases where the students with similar mid-term scores have a higher final score.
library(dplyr)
df %>%
arrange(TUserId, mid_sum) %>%
group_by(TUserId) %>%
mutate(
Status = if_else(
sign(final_sum - lag(final_sum, default = 0) + lead(final_sum, default = 0))
== sign(mid_sum - lag(mid_sum, default = 0) + lead(mid_sum, default = 0)),
"consisent", "inconsistent"
)
)