4

Scenario: I have a df, "scores" of multiple users attempt(s) at passing a test. Each observation is an attempt with the userID, and score. Some users may pass on their first attempt, some might take several; they get unlimited attempts. I want to find the average score for each user.

For example:

userID = c(1:20, sample(1:20, 10, replace = TRUE))
score = c(rnorm(15, mean = 60, sd = 10), rnorm(8, mean = 70, sd = 5), 
rnorm(7, mean = 90, sd = 2))
scores = data.frame(userID, score)

I need an end result data frame that is just a list of unique userIDs with the average of all of their attempts (whether they attempted once or several times).

Of all the dumb approaches I've tried, my most recent was:

avgScores = aggregate(scores, by=list("userID"), "mean")

and got the following error message: "arguments must have same length." I've also tried sorting and sub-setting (actual data frame has time stamps) and wiggling my nose and tapping my shoes together but I'm getting no where and this noob brain is fried.

THANK YOU

blerg
  • 65
  • 6

3 Answers3

5

Better (more elegant ) here to use aggregate with the formula form :

aggregate(score~userID,scores,mean)

Or using the classic form as you have tried , but you get a slightly different result :

aggregate(scores,by=list(userID),mean) ## using name and not string

Of course if you have big data.frame , better to use one of the solution as suggested in other answers.

agstudy
  • 119,832
  • 17
  • 199
  • 261
3
#data.table
library(data.table)
DT<-data.table(scores)
DT[,.(mean_score=mean(score)),by=userID]

#dplyr
library(dplyr)
scores %>%
group_by(userID)%>%
summarise(mean_score=mean(score))
Metrics
  • 15,172
  • 7
  • 54
  • 83
  • never even considered the data.table approach so I'll have to play around with that as well as the dplyr. Thanks for the help! – blerg Mar 09 '15 at 14:35
3

You can do:

library(dplyr)
scores %>% group_by(userID) %>% summarise(mean = mean(score))
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77