2

I am attempting to aggregate data using multiple functions. It works well except the output isn't what it says it is.

df<-data.frame(PLTID=rep(paste(letters), 5), R=rnorm(130, 150, 40), G=rnorm(130, 100, 50), B=rnorm(130, 200, 25))

agg.data<-aggregate(data=df, . ~ PLTID, FUN = function(x) c(mean=mean(x, na.rm=TRUE), sd=sd(x, na.rm=TRUE), n=length(x)))

> head(agg.data)
  PLTID    R.mean      R.sd       R.n    G.mean      G.sd       G.n     B.mean       B.sd        B.n
1     a 144.29202  28.49934   5.00000  87.85852  76.28156   5.00000 192.230731  29.349837   5.000000
2     b 148.41993  31.37764   5.00000  84.14367  21.59658   5.00000 224.862334  15.769459   5.000000
3     c 158.89111  17.75320   5.00000 114.40942  53.81999   5.00000 205.931132   5.790658   5.000000
4     d 153.70118  34.68649   5.00000 137.97890  54.79668   5.00000 209.587936  21.877864   5.000000
5     e 154.28002  27.33376   5.00000 124.87306  71.09103   5.00000 225.774236  17.720090   5.000000
6     f 144.08148  37.30555   5.00000  57.15275  44.68034   5.00000 204.709050  18.207047   5.000000

This is what I want... except when I want to access the columns later they aren't actually there.

> names(agg.data)

[1] "PLTID" "R"     "G"     "B"



> str(agg.data)
'data.frame':   26 obs. of  4 variables:
 $ PLTID: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ R    : num [1:26, 1:3] 144 148 159 154 154 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "mean" "sd" "n"
 $ G    : num [1:26, 1:3] 87.9 84.1 114.4 138 124.9 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "mean" "sd" "n"
 $ B    : num [1:26, 1:3] 192 225 206 210 226 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "mean" "sd" "n"

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.8.10 plyr_1.8          gridExtra_0.9.1   rgl_0.93.963      ggplot2_0.9.3.1  

loaded via a namespace (and not attached):
 [1] colorspace_1.2-4   dichromat_2.0-0    digest_0.6.3       gtable_0.1.2       labeling_0.2       MASS_7.3-29        munsell_0.4.2      proto_0.3-10       RColorBrewer_1.0-5
[10] reshape2_1.2.2     scales_0.2.3       stringr_0.6.2      tools_3.0.2   
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
bhive01
  • 251
  • 1
  • 12
  • A short workaround appears to be copying the output and pasting it into `agg.data2<-read.table(header=TRUE, text="")`. Really shouldn't have to though. – bhive01 Oct 30 '13 at 10:08
  • 1
    A similar phenomenon was discussed in [@DWin's answer here](http://stackoverflow.com/questions/12064202/using-aggregate-for-multiple-aggregations). – Henrik Oct 30 '13 at 10:21
  • 2
    @Henrik, I usually recommend `do.call(data.frame, agg.data)` to "flatten" this type of output. – A5C1D2H2I1M1N2O1R2T1 Oct 30 '13 at 10:26
  • Yup, that's the same phenomenon for sure. as.dataframe(as.list()) works. Not sure why this should be necessary. do.call(data.frame, agg.data) works too. – bhive01 Oct 30 '13 at 10:26
  • Is it normal to have to play with the output from aggregate() or is this something new? Previously I've only used plyr and only needed summary data from one column so I've never used aggregate() before now. – bhive01 Oct 30 '13 at 10:29
  • 1
    @bhive01, not necessarily. It is when you are passing multiple functions to a single `aggregate` command though, which is what you were doing. – A5C1D2H2I1M1N2O1R2T1 Oct 30 '13 at 10:31
  • 2
    Perhaps they should add a line to the help file with this example, or add a line of code when multiple inputs are detected to deal with this. Horribly confusing to see the output but not be able to use it. str() indicated it wasn't there at all... When I tried as.data.frame() on its own it did nothing. – bhive01 Oct 30 '13 at 10:35
  • 2
    @bhive01, I was confused by it too the first time I encountered it, and went so far as to write my own wrapper for `aggregate` until I found this `do.call(data.frame, ...)` approach. Other things to look out for is when it simplifies and when it creates a `list` as the output :) See: http://stackoverflow.com/q/14634633/1270695 – A5C1D2H2I1M1N2O1R2T1 Oct 30 '13 at 11:57

1 Answers1

4

Generally, when I've run into this problem, I use a do.call(data.frame, ...) to "flatten" the output. See the example below:

agg.data.2 <- do.call(data.frame, agg.data)
head(agg.data.2)
#   PLTID   R.mean     R.sd R.n    G.mean      G.sd G.n   B.mean     B.sd B.n
# 1     a 154.5559 33.43817   5 137.17159  26.21461   5 196.7478 28.59095   5
# 2     b 156.8467 32.62164   5 131.35181  48.94969   5 206.3617 17.97154   5
# 3     c 134.1172 21.83323   5  50.35166  71.57083   5 176.2187 38.82059   5
# 4     d 143.5015 43.31296   5 104.60027 103.66002   5 210.0669 18.40418   5
# 5     e 141.8565 27.02871   5  76.82525  47.76355   5 194.3454 30.87080   5
# 6     f 143.3337 12.14870   5 105.95271  42.22400   5 189.3063 12.11930   5
str(agg.data.2)
# 'data.frame': 26 obs. of  10 variables:
# $ PLTID : Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ R.mean: num  155 157 134 144 142 ...
# $ R.sd  : num  33.4 32.6 21.8 43.3 27 ...
# $ R.n   : num  5 5 5 5 5 5 5 5 5 5 ...
# $ G.mean: num  137.2 131.4 50.4 104.6 76.8 ...
# $ G.sd  : num  26.2 48.9 71.6 103.7 47.8 ...
# $ G.n   : num  5 5 5 5 5 5 5 5 5 5 ...
# $ B.mean: num  197 206 176 210 194 ...
# $ B.sd  : num  28.6 18 38.8 18.4 30.9 ...
# $ B.n   : num  5 5 5 5 5 5 5 5 5 5 ...
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485