2

I am calculating final averages for a course. There are about 500 students, and the grades are organized into a .csv file. Column headers include:

Name, HW1, ..., HW10, Quiz1, ..., Quiz5, Exam1, Exam2, Final

Each is weighted differently, and that shouldn't be an issue programming. However, the lowest 2 HW and the lowest Quiz are dropped for each student. How could I program this in r? Note that the HW/Quiz dropped for each student may be different (i.e. Student A has HW2, HW5, Quiz2 dropped, Student B has HW4, HW8, Quiz1 dropped).

989
  • 12,579
  • 5
  • 31
  • 53
Kirk Fogg
  • 521
  • 5
  • 14
  • 1
    Try making a [small reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), show what you've tried, and where you're stuck. – Gregor Thomas May 07 '14 at 23:50
  • 1
    A general strategy would be to use `reshape::melt` to get your data in "tidy" format (one row for every each student and assignment), then use `dplyr` to drop the lowest and do whatever other weightings/averageings you need. `dplyr` has excellent documentation. – Gregor Thomas May 07 '14 at 23:53
  • Thank you for the hint about dplyr. I'll look into this (I have the data in Excel, so I think it's already in a tidy format). – Kirk Fogg May 08 '14 at 01:05
  • I'm working on a dplyr solution now. It tends to be a little more intuitive than the do.call stuff (and speedier too) – Andrew May 08 '14 at 01:20
  • @KirkFogg Some reading about "tidy data" from the author of dplyr: vita.had.co.nz/papers/tidy-data.pdf, it's nice way to think about data structuring. – Gregor Thomas May 08 '14 at 04:53

3 Answers3

3

Here is a simpler solution. The sum_after_drop function takes a vector x and drops the i lowest scores and sums up the remaining. We invoke this function for each row in the dataset. ddply is overkill for this job, but keeps things simple. You should be able to do this with apply, except that you will have to convert the end result to a data frame.

The actual grade calculations can then be carried out on dd2. Note that using the cut function with breaks is a simple way to get letter grades from the total scores.

library(plyr)
sum_after_drop <- function(x, i){
  sum(sort(x)[-(1:i)])
}

dd2 = ddply(dd, .(Name), function(d){
  hw = sum_after_drop(d[,grepl("HW", nms)], 1)
  qz = sum_after_drop(d[,grepl("Quiz", nms)], 1)
  data.frame(hw = hw, qz = qz)
})
Ramnath
  • 54,439
  • 16
  • 125
  • 152
2

Here's a sketch of how you could approach it using the reshape2 package and base functions.

#sample data
set.seed(734)
dd<-data.frame(
    Name=letters[1:20],
    HW1=rpois(20,7),
    HW2=rpois(20,7),
    HW3=rpois(20,7),
    Quiz1=rpois(20,15),
    Quiz2=rpois(20,15),
    Quiz3=rpois(20,15)
)

Now I convert it to long format and split apart the field names

require(reshape2)
mm<-melt(dd, "Name")
mm<-cbind(mm,
    colsplit(gsub("(\\w+)(\\d+)","\\1:\\2",mm$variable, perl=T), ":",
    names=c("type","number"))
)

Now i can use by() to get a data.frame for each name and do the rest of the calculations. Here i just drop the lowest homework and lowest quiz and i give homework a weight of .2 and quizzes a weight of .8 (assuming all home works were worth 15pts and quizzes 25 pts).

grades<-unclass(by(mm, mm$Name, function(x) {
    hw <- tail(sort(x$value[x$type=="HW"]), -1);
    quiz <- tail(sort(x$value[x$type=="Quiz"]), -1);
    (sum(hw)*.2 + sum(quiz)*.8) / (length(hw)*15*.2+length(quiz)*25*.8)
}))
attr(grades, "call")<-NULL   #get rid of crud from by()
grades;

Let's check our work. Look at student "c"

   Name HW1 HW2 HW3 Quiz1 Quiz2 Quiz3
      c   6   9   7    21    20    14

Their grade should be

((9+7)*.2+(21+20)*.8) / ((15+15)*.2 + (25+25)*.8) = 0.7826087

and in fact, we see

grades["c"] == 0.7826087
MrFlick
  • 195,160
  • 17
  • 277
  • 295
0

Here's a solution with dplyr. It ranks the scores by student and type of assignment (i.e. calculates the rank order of all of student 1's homeworks, etc.), then filters out the lowest 1 (or 2, or whatever). dplyr's syntax is pretty intuitive—you should be able to walk through the code fairly easily.

# Load libraries
library(reshape2)
library(dplyr)

# Sample data
grades <- data.frame(name=c("Sally", "Jim"),
                     HW1=c(10, 9),
                     HW2=c(10, 5),
                     HW3=c(5, 10),
                     HW4=c(6, 9),
                     HW5=c(8, 9),
                     Quiz1=c(9, 5),
                     Quiz2=c(9, 10),
                     Quiz3=c(10, 8),
                     Exam1=c(95, 96))

# Melt into long form
grades.long <- melt(grades, id.vars="name", variable.name="graded.name") %.%
  mutate(graded.type=factor(sub("\\d+","", graded.name)))
grades.long

# Remove the lowest scores for each graded type
grades.filtered <- grades.long %.%
  group_by(name, graded.type) %.%
  mutate(ranked.score=rank(value, ties.method="first")) %.%  # Rank all the scores
  filter((ranked.score > 2 & graded.type=="HW") |  # Ignore the lowest two HWs
         (ranked.score > 1 & graded.type=="Quiz") |  # Ignore the lowest quiz
         (graded.type=="Exam"))
grades.filtered

# Calculate the average for each graded type
grade.totals <- grades.filtered %.%
  group_by(name, graded.type) %.%
  summarize(total=mean(value))
grade.totals

# Unmelt, just for fun
final.grades <- dcast(grade.totals, name ~ graded.type, value.var="total")
final.grades

You technically could add the summarize(total=mean(value)) to the grades.filtered data frame rather than making a separate grade.totals data frame—I separated them into multiple data frames for didactical reasons.

Andrew
  • 36,541
  • 13
  • 67
  • 93
  • What does it mean when, for example, I have the following error message: (graded.type=="Exam")) Error in eval(expr, envir, enclos) : index out of bounds – Kirk Fogg May 08 '14 at 03:56
  • The `filter` command is really just a long conditional statement, separated with newlines for readability. Make sure everything is nested properly: `filter((stuff for HW) | (stuff for quizzes) | (stuff for exams) | (stuff for other things, like the final))`. – Andrew May 08 '14 at 04:06
  • Does the code work if you run it as is (before making your own modifications)? It should give [this output](http://rpubs.com/andrewheiss/grades_example). – Andrew May 08 '14 at 04:09
  • Your code worked correctly, and it worked for me when I had a small sample size. However it didn't work when my data set was the entire class roster of 500 students. I'll go back and see if everything is correctly formatted. Thanks! – Kirk Fogg May 08 '14 at 14:12
  • One more issue comes up, since I have a HW11 and HW12. When I use gsub, it only removes the first "1", and not the second digit (I think). So in grades.long, these are still listed as "HW1", whereas the others are listed as just "HW", which is what I want in the end. – Kirk Fogg May 08 '14 at 14:37
  • Ooh. That might be causing the other error too. I just changed the line with the `gsub()` command—that should work better (it replaces all numbers with nothing). – Andrew May 08 '14 at 14:43
  • That did the trick. But unfortunately, I am still receiving the error message from before. For reference, I have 10 HWs ranging from 0-5, 5 Quizzes ranging from 0-10, and 2 Exams ranging from 0-100, for nearly 500 students/rows. Not sure if the issue is due to any of that... – Kirk Fogg May 08 '14 at 15:06
  • Take subsets of your data and try to find the problem rows (like, only do 50 rows at a time or something). – Andrew May 08 '14 at 18:29