0

My data.frame

df <- data.frame(ID=rep(1:3, 3), Obs_1=rnorm(9), Obs_2=rnorm(9), Obs_3=rnorm(9))

I want to calculate the mean of each colum by the ID. I have tried two approaches: 1)

ag <- aggregate(. ~ ID, df, function(x) c(mean = mean(x)))

returns

  ID      Obs_1      Obs_2      Obs_3
1  1 0.41220831 -0.9999704 -0.7234958
2  2 0.03564336  0.5014259  0.4847635
3  3 0.05647885  0.2067311 -0.0542368

Why does it does not indicate "mean" in the column name and how can make this reported? something like this:

Obs_1.mean  Obs_2.mean Obs_3.mean  

2) the second approach

 df[, c(mean = lapply(.SD, mean), sd = lapply(.SD, sd)), by = ID]

it gives

unused argument (by = ID)

Thank you

Al14
  • 1,734
  • 6
  • 32
  • 55
  • simply rename the columns: `colnames(df) <- c('ID','Obs_1.mean','Obs_2.mean','Obs_3.mean')` – Barranka Sep 04 '15 at 17:40
  • 1
    For the (2) you need to have df as a data.table. Like : `df = data.table(df); df[, c(mean = lapply(.SD, mean), sd = lapply(.SD, sd)), by = ID]` – AntoniosK Sep 04 '15 at 17:46
  • You need to specify the packages that you use in your question!! The second approach will not work without `data.table::data.table(df)` – alexwhitworth Sep 04 '15 at 17:48
  • but I think you can use the solutions in the comments to [this question](http://stackoverflow.com/questions/31819208/data-table-syntax-for-split-apply-combine-ala-plyr) to get what you want. – alexwhitworth Sep 04 '15 at 17:50

1 Answers1

2

Here is one approach:

df <- data.frame(ID=rep(1:3, 3),
                 Obs_1=rnorm(9),
                 Obs_2=rnorm(9),
                 Obs_3=rnorm(9))

ag <- aggregate(. ~ ID, df, FUN = mean)

colnames(ag) <- sapply(colnames(ag),
                       FUN = function(x) { paste0(x, ifelse(x=="ID", "", ".mean")) })

It yields:

  ID Obs_1.mean Obs_2.mean Obs_3.mean
1  1  0.2675131 -0.3494175 -0.3128264
2  2  1.0988356 -0.6645116 -0.2192402
3  3 -1.1569562 -0.4040192 -0.1265475

About your question why the trick

FUN = function(x) { c(mean = mean(x)) })

to stick the name "mean" to the result of FUN does not work:

I suppose that is due to the grouping that aggregate() performs. Each group might in theory return a different name. Which of these potentially conflicting names should aggregate() use then to name the complete column of the aggregated dataframe...?

If no grouping happens, like here,

sapply(df, FUN = function(x) { c(mean = mean(x)) })

we get the expected names:

 ID.mean   Obs_1.mean  Obs_2.mean  Obs_3.mean 
2.0000000   0.4265256  -0.2046452  -0.4496054 
WhiteViking
  • 3,146
  • 1
  • 19
  • 22
  • In regard to `sapply(df, FUN = function(x) { c(mean = mean(x)) })` to avoid also the average of ID column: `sapply(df[ ,2:length(df)], FUN = function(x) { c(mean = mean(x))})` – Al14 Sep 04 '15 at 19:07