Average columns in a dataframe by groups

Question

My data.frame

df <- data.frame(ID=rep(1:3, 3), Obs_1=rnorm(9), Obs_2=rnorm(9), Obs_3=rnorm(9))

I want to calculate the mean of each colum by the ID. I have tried two approaches: 1)

ag <- aggregate(. ~ ID, df, function(x) c(mean = mean(x)))

returns

  ID      Obs_1      Obs_2      Obs_3
1  1 0.41220831 -0.9999704 -0.7234958
2  2 0.03564336  0.5014259  0.4847635
3  3 0.05647885  0.2067311 -0.0542368

Why does it does not indicate "mean" in the column name and how can make this reported? something like this:

Obs_1.mean  Obs_2.mean Obs_3.mean

2) the second approach

 df[, c(mean = lapply(.SD, mean), sd = lapply(.SD, sd)), by = ID]

it gives

unused argument (by = ID)

Thank you

simply rename the columns: `colnames(df) <- c('ID','Obs_1.mean','Obs_2.mean','Obs_3.mean')` — Barranka, Sep 04 '15 at 17:40
For the (2) you need to have df as a data.table. Like : `df = data.table(df); df[, c(mean = lapply(.SD, mean), sd = lapply(.SD, sd)), by = ID]` — AntoniosK, Sep 04 '15 at 17:46
You need to specify the packages that you use in your question!! The second approach will not work without `data.table::data.table(df)` — alexwhitworth, Sep 04 '15 at 17:48
but I think you can use the solutions in the comments to [this question](http://stackoverflow.com/questions/31819208/data-table-syntax-for-split-apply-combine-ala-plyr) to get what you want. — alexwhitworth, Sep 04 '15 at 17:50

WhiteViking · Accepted Answer · 2015-09-04T18:35:03.340

Here is one approach:

df <- data.frame(ID=rep(1:3, 3),
                 Obs_1=rnorm(9),
                 Obs_2=rnorm(9),
                 Obs_3=rnorm(9))

ag <- aggregate(. ~ ID, df, FUN = mean)

colnames(ag) <- sapply(colnames(ag),
                       FUN = function(x) { paste0(x, ifelse(x=="ID", "", ".mean")) })

It yields:

  ID Obs_1.mean Obs_2.mean Obs_3.mean
1  1  0.2675131 -0.3494175 -0.3128264
2  2  1.0988356 -0.6645116 -0.2192402
3  3 -1.1569562 -0.4040192 -0.1265475

About your question why the trick

FUN = function(x) { c(mean = mean(x)) })

to stick the name "mean" to the result of FUN does not work:

I suppose that is due to the grouping that aggregate() performs. Each group might in theory return a different name. Which of these potentially conflicting names should aggregate() use then to name the complete column of the aggregated dataframe...?

If no grouping happens, like here,

sapply(df, FUN = function(x) { c(mean = mean(x)) })

we get the expected names:

 ID.mean   Obs_1.mean  Obs_2.mean  Obs_3.mean 
2.0000000   0.4265256  -0.2046452  -0.4496054

In regard to `sapply(df, FUN = function(x) { c(mean = mean(x)) })` to avoid also the average of ID column: `sapply(df[ ,2:length(df)], FUN = function(x) { c(mean = mean(x))})` — Al14, Sep 04 '15 at 19:07

Average columns in a dataframe by groups

1 Answers1