I have a dataframe with sets of scores, and sets of grouping variables, something like:
s1 s2 s3 g1 g2 g3
4 3 7 F F T
6 2 2 T T T
2 4 9 G G F
1 3 1 T F G
I want to run an aggregate, at the moment I'm doing:
aggregate(df[c("s1","s2","s3")],df["g1"],function(x) c(m =mean(x, na.rm=T), sd = sd(x, na.rm=T), n = length(x)))
I'd like to have just one line of code, so I could aggregate the multiple variables by multiple factors all at once. Note I'm not trying to get a summary of s1-3 by combinations of g1-3 (as per answers here). I've looked at summaryBy
in the doBy
package, but again that seems to do combinations of each factor rather than just an overall which isn't what I want (useful though!). I've been playing with variants on:
apply(df[c("g1","g2","g3")], 2, function (z) aggregate(df[c("s1","s2","s3")],z,function(x) c(m =mean(x, na.rm=T), sd = sd(x, na.rm=T), n = length(x)))
But I get the error: "'by' must be a list" with that. I think I could work out how to do this with a loop
and I know with various versions of ddply
or reshape
you can get aggregation but the most intuitive way (to me at least) seems to be an apply
and aggregate
- what am I missing?