average an unknown number of responses per respondent; R

Question

Scenario: I have a df, "scores" of multiple users attempt(s) at passing a test. Each observation is an attempt with the userID, and score. Some users may pass on their first attempt, some might take several; they get unlimited attempts. I want to find the average score for each user.

For example:

userID = c(1:20, sample(1:20, 10, replace = TRUE))
score = c(rnorm(15, mean = 60, sd = 10), rnorm(8, mean = 70, sd = 5), 
rnorm(7, mean = 90, sd = 2))
scores = data.frame(userID, score)

I need an end result data frame that is just a list of unique userIDs with the average of all of their attempts (whether they attempted once or several times).

Of all the dumb approaches I've tried, my most recent was:

avgScores = aggregate(scores, by=list("userID"), "mean")

and got the following error message: "arguments must have same length." I've also tried sorting and sub-setting (actual data frame has time stamps) and wiggling my nose and tapping my shoes together but I'm getting no where and this noob brain is fried.

THANK YOU

agstudy · Accepted Answer · 2015-03-06T22:51:53.960

5

Better (more elegant ) here to use aggregate with the formula form :

aggregate(score~userID,scores,mean)

Or using the classic form as you have tried , but you get a slightly different result :

aggregate(scores,by=list(userID),mean) ## using name and not string

Of course if you have big data.frame , better to use one of the solution as suggested in other answers.

edited Mar 06 '15 at 22:51

answered Mar 06 '15 at 22:46

agstudy

119,832
17
199
261

score 3 · Answer 2 · answered Mar 06 '15 at 22:45

3

#data.table
library(data.table)
DT<-data.table(scores)
DT[,.(mean_score=mean(score)),by=userID]

#dplyr
library(dplyr)
scores %>%
group_by(userID)%>%
summarise(mean_score=mean(score))

answered Mar 06 '15 at 22:45

Metrics

15,172
7
54
83

never even considered the data.table approach so I'll have to play around with that as well as the dplyr. Thanks for the help! – blerg Mar 09 '15 at 14:35

score 3 · Answer 3 · answered Mar 06 '15 at 22:45

3

You can do:

library(dplyr)
scores %>% group_by(userID) %>% summarise(mean = mean(score))

answered Mar 06 '15 at 22:45

Steven Beaupré

21,343
7
57
77

Also worked and I appreciated the different approaches. Thanks! – blerg Mar 09 '15 at 14:33

average an unknown number of responses per respondent; R

3 Answers3