I have a data.table
consisting of anonymised final exam results for students. Each row is a student, each column a subject.
If a student took a subject they have a their resulting grade, and otherwise NA
if they did not. I want to find the total number of subjects each student took.
The dt
comes with this (num_subjects
), but it's not coded to account for some subjects allowing study at 'double' intensity. Double intensity is denoted by having 2 grades recorded rather than just 1.
So I need to write a code that looks for instances of double grades like AA or AB or CC, and where found, adds +1 to a new variable for the number of subjects taken.
Example data: note there are actually 30+ potential subjects
subject1 subject2 subject3 num_subjects
AA NA NA 1
B BB C 3
NA NA A 1
NA CC D 2
Desired output
subject1 subject2 subject3 num_subjects new_num_subjects
AA NA NA 1 2
B BB C 3 4
NA NA A 1 1
NA CC D 2 3
To do this for one subject, I'd just do:
dt[, new_num_subjects:= num_subjects]
dt[
(subject1=="AA" |
subject1=="AB" |
subject1=="BB" |
subject1=="BC" |
subject1=="CC" |
), new_num_subjects:= num_subjects+1]
This works. But how to avoid writing this for all 30+ subjects?
Have tried making a vector of all subject names (subjectvec
) and looping through it (see below), but this does not work:
for(i in subjectvec)
dt[
( (i) =="AA" |
(i) =="AB" |
(i) =="BB" |
(i) =="BC" |
(i) =="CC" |
), new_num_subjects:= num_subjects+1]