6

I can't wrap my mind around the ave function. I read the help and searched the net but I still cannot understand what it does. I understand it applies some function on a subset of observation but not in the same way as for example tapply

Could someone please enlighten me perhaps with a small example?

Thanks, and excuse me for perhaps an unusual request.

Blue Magister
  • 13,044
  • 5
  • 38
  • 56
ECII
  • 10,297
  • 18
  • 80
  • 121

1 Answers1

13

tapply returns a single result for each factor level. ave also produces a single result per factor level, but it copies this value to each position in the original data.

ave is handy for producing a new column in a data frame with summary data.

A short example:

tapply(iris$Sepal.Length, iris$Species, FUN=mean)
    setosa versicolor  virginica 
     5.006      5.936      6.588 

One value, the mean for each factor level.

ave on iris produces 150 results, which line up with the original data frame:

 ave(iris$Sepal.Length, iris$Species, FUN=mean)
  [1] 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006
 [17] 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006
 [33] 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006
 [49] 5.006 5.006 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936
 [65] 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936
 [81] 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936
 [97] 5.936 5.936 5.936 5.936 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588
[113] 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588
[129] 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588
[145] 6.588 6.588 6.588 6.588 6.588 6.588

As noted in the comments, here the single value is being recycled to fill each location in the original data.

If the function returns multiple values, these are recycled if necessary to fill in the locations. For example:

d <- data.frame(a=rep(1:2, each=5), b=1:10)
ave(d$b, d$a, FUN=rev)
 [1]  5  4  3  2  1 10  9  8  7  6

Thanks to Josh and thelatemail.

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • 2
    Oops. Posted my own answer (now deleted) without seeing yours. I'd suggest one minor correction, which is that neither `ave` nor `tapply` need produce a single result per factor level. Set `FUN=cumsum`, or some such, to see that. – Josh O'Brien Mar 09 '14 at 23:37
  • 1
    `tapply` does produce a single result per factor level. This result may just be multiple values, lists, other objects etc. `ave` can also return multiple values, and will do so (possibly sensibly) if the number of values returned matches up with the length of the input vector. e.g. `ave(1:10,rep(1:2,each=5),FUN=rev)` – thelatemail Mar 09 '14 at 23:51
  • I'd suggest using `rep(1:2, ` **each** `=5)` for the example. It makes it easier to identify the reversing within the groups. – thelatemail Mar 10 '14 at 00:12
  • @thelatemail Yes, that is better. – Matthew Lundberg Mar 10 '14 at 00:13
  • It could also be confused with the aggregate function – skan Jun 15 '16 at 16:21