0

I have a data frame with following structure, dput(scoreDF):

scoreDF <- structure(list(ID = c(1, 2), Status = structure(c(2L, 1L),
  .Label = c("Fail", "Pass"), class = "factor"), Subject_1_Score = c(100, 25),
  Subject_2_Score = c(50, 76)), .Names = c("ID", "Status", "Subject_1_Score",
  "Subject_2_Score"), row.names = c(NA, -2L), class = "data.frame")

Now, I need to come up with the % of students who passed and failed, mean of the students who passed and failed, standard error for the same.

For standard error, I have defined a function as follows:

stdErr <- function(x) {sd(x)/ sqrt(length(x))}

where I expect x to be a vector whose standard error needs to be calculated.

I have seen the doc for ddply, but I am not able to figure out how to calculate the % i.e. (number of passes)/ (total count) and standard error for the data frame above.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
name_masked
  • 9,544
  • 41
  • 118
  • 172
  • 1
    This is not a reproducible question. See e.g. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for inspiration. – Dirk Eddelbuettel Oct 11 '12 at 20:18
  • 1
    no need for `plyr` if I understand your question. `nrow(Data[Data$Status=='Pass',])/nrow(Data)`. Unless you want to split on `ID`... `ddply(Data, .(ID), summarise, sum(Status=='Pass')/length(Status)` – Justin Oct 11 '12 at 20:19
  • @Justin: I was hoping of coming up with a way where I do not have to hard code the values, like `Status == 'Pass'`. This is why I was trying to find something using `ddply`. Is it possible to summarise by `Status` instead of `ID` – name_masked Oct 11 '12 at 21:07
  • I don't understand how you can calculate `# passes / total` without "hard coding" that fact somewhere. – Justin Oct 11 '12 at 23:54

1 Answers1

3

You can use tapply to calculate group statistics. If your data frame is called students then to calculate mean by pass/fail you would specify:

tapply(students$Subject_1_Score, students$Status, FUN=mean)

For the standard error substitute your stdErr function for mean.

If you want to calculate something across multiple columns, you can index x:

tapply(students[,2:3], students$Status, FUN=mean)

To calculate percent of students that passed:

dim(students[students$Status == "Pass" ,])[1] / dim(students)[1]

Or by score:

dim(students[students$Subject_1_Score >= 65 ,])[1] / dim(students)[1]

The above is a dataframe example of this type of vector statement using indexing:

length(x[x == "Pass"]) / length(x)

To calculate a function across rows or columns you can use apply.

Jeffrey Evans
  • 2,325
  • 12
  • 18
  • `standard error substitute your stdErr function for mean.` .. but they are not same right? – name_masked Oct 11 '12 at 20:53
  • The FUN argument is to pass a function to tapply. If you want to use your stdErr function: tapply(students$Subject_1_Score, students$Status, FUN=stdErr) – Jeffrey Evans Oct 11 '12 at 20:57