Calculate mean by group

Question

set.seed(2218)
exdf <- data.frame(
    c(rep(1:105, 28)),
    sort(c(rep(1:28, 105))),
    sort(rep(rnorm(28), 105)),
    sample(0:1, 105*28, replace=TRUE),
    rep(rnorm(105), 28)
)
colnames(exdf) <- c("ID", "content", "b", "APMs", "Gf")
View(exdf)

This gives you a good idea of my dataset. Now I would like to turn it into something like this:

content     b           APMs
1           mean(b)     mean(APMs)
2           mean(b)     mean(APMs)
3           mean(b)     mean(APMs)
...         ...         ...
28          mean(b)     mean(APMs)

As you can see, Gf should be dropped, while I get the mean across 105 IDs for each of the 28 contents. The one solution that comes close is the following, but it can only deal with one variable, it seems.

library(reshape2)
itemwide <- dcast(
    data= exdf, 
    formula= content ~ "b",
    value.var= "b",
    fun.aggregate= mean, na.rm=TRUE
)
View(itemwide)

Note to self:
what David actually means is this:

itemwide <- aggregate(
    formula= cbind(b, APMs) ~ content, 
    data= exdf, 
    FUN= mean, na.rm=TRUE
)

The solution that comes to mind is just `aggregate(cbind(b, APMs) ~ content, exdf, mean)` or just Googling. — David Arenburg, Oct 12 '15 at 19:29

Jaap · Accepted Answer · 2015-10-12T19:30:40.067

What you are trying to do is not reshaping your data, but summarising. So, you don't need dcast but a summarising function. You could try this:

library(data.table)
setDT(exdf)[, lapply(.SD, mean), by = content, .SDcols=c("b","APMs")]

this gives:

    content           b      APMs
 1:       1 -3.05332596 0.4666667
 2:       2 -2.06610577 0.5619048
 3:       3 -1.13791427 0.5714286
 4:       4 -0.92448090 0.4380952
 5:       5 -0.71275890 0.5047619
 6:       6 -0.63886781 0.4571429
 7:       7 -0.62661130 0.5428571
 8:       8 -0.53520089 0.4380952
 9:       9 -0.39673688 0.5523810
10:      10 -0.39221476 0.4761905
11:      11 -0.36342977 0.5714286
12:      12 -0.34620176 0.5142857
13:      13 -0.26971611 0.5428571
14:      14 -0.13581832 0.5333333
15:      15 -0.11093658 0.4571429
16:      16 -0.09053545 0.5333333
17:      17 -0.03242315 0.4666667
18:      18  0.08462955 0.4857143
19:      19  0.09506010 0.5333333
20:      20  0.28455671 0.4476190
21:      21  0.28534999 0.5523810
22:      22  0.48477913 0.4476190
23:      23  0.58413458 0.4380952
24:      24  0.95284381 0.5809524
25:      25  1.05399249 0.4952381
26:      26  1.13404533 0.3523810
27:      27  1.17226739 0.4476190
28:      28  1.76672474 0.5047619

In base R you can use (as @DavidArenburg said in the comments):

aggregate(cbind(b, APMs) ~ content, exdf, mean)

Calculate mean by group

1 Answers1

Linked