4

What is the difference between ave() and mean() function in R?

For example I am trying to find out the average of a particular column of a dataframe in R?

I came across these two functions:

mean(dataset$age, na.rm= TRUE)
ave(dataset$age, FUN=function(x)mean(x, na.rm = TRUE))

The first function clearly gave me the mean as a single value. Whereas the second function also gave me the mean but had as many elements as there were non-missing values in the rows of the dataframe. Why is it so? And what is the use of a function like ave() when mean neatly gives the ave?

Shree
  • 10,835
  • 1
  • 14
  • 36
Arjun Raaghav
  • 79
  • 1
  • 8
  • 2
    `ave` by default does a `mean` by group and is used to create a new column as it returns an output with the same `length` as input and the order remains the same – akrun Aug 12 '19 at 14:58
  • @akrun - I didn't understand. Can you elaborate? – Arjun Raaghav Aug 12 '19 at 14:59
  • 4
    `ave(1:10, rep(1:2, each = 5))` returns the mean of first 5 and next 5 with overlal length equal to 10 while `mean(1:10)` returns a single value with the whole mean of 1:10. Regaring the order `ave(1:5 c(3, 4, 1, 3, 4))`. the mean values for each corressponding group will be returned in the same order as `c(3, 4, 1, 3, 4)` – akrun Aug 12 '19 at 15:00

1 Answers1

16

Elaborating on @akrun's comments -

Suppose x <- 1:10.

1) mean always returns vector of length 1.

mean(x)
[1] 5.5

2) ave always returns a vector of same length as input vector

ave(x)
[1] 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5

The cool thing about ave is that you can also divide x into groups and apply any function FUN to get an output, again, of same length as x -

Let's divide x in two groups of 3 and 7 elements each i.e. rep(1:2, each = 5)

(grouping <- rep(1:2, c(3,7)))
[1] 1 1 1 2 2 2 2 2 2 2

# Now calculating mean for each group -    
ave(x, grouping, FUN = mean)
[1] 2 2 2 7 7 7 7 7 7 7

# calculating sum for each group
ave(x, grouping, FUN = sum)
[1]  6  6  6 49 49 49 49 49 49 49

# any custom function can be applied to ave, not just mean
ave(x, grouping, FUN = function(a) sum(a^2))
[1]  14  14  14 371 371 371 371 371 371 371

Above results are similar to what you'd get from a tapply with the difference being that output is of the same length as x.

tapply(x, grouping, mean)
1 2 
2 7 

tapply(x, grouping, sum)
1  2 
6 49 

tapply(x, grouping, function(a) sum(a^2))
1   2 
14 371

Finally, you can define your own function and pass it to FUN argument of ave so you are not restricted to just calculating the mean.

The output length = input length property makes ave very useful for adding columns to tabular data. Example- Calculate group mean (or other summary stats) and assign to original data

Shree
  • 10,835
  • 1
  • 14
  • 36