I am trying to analyse large data-sets of student scores. Some students do retakes which produces duplicate scores, usually with the earlier low score placed the row above their retake, usually higher, score. I want to select their highest score, with a file that has only one line per student (which I will need to merge with other files having same ids).
Source file is like this:
STUDID MATRISUBJ SUBJSCORE
1032 AfrikaansB 2
1032 isiZuluB 7
1033 IsiXhosaB 6
1034 AfrikaansB 1
1034 EnglishB 4
1034 isiZuluB 3
result should look like this.
STUDID MATRISUBJ SUBJSCORE
1032 isiZuluB 7
1033 isiXhosaB 6
1034 EnglishB 4
Help, please..I used to do this process in SPS but now can't get access to this commercialised software, so am swapping to R
df2[!duplicated(df2[1:1]),]
gives the first row of the duplicate but I want the one with highest value, and sometimes student tries with another subject to get required score in languages