-1

In the following data frame, I need to take the mean of all values in a, b , c

values <- data.frame(value = c("a", "a", "a", "a", "a", 
                           "b", "b", "b", 
                           "c", "c", "c", "c"), i = c(1,2,3,4,5,6,7,8,9,10,11,12))

To achieve this, I tried using aggregate function as follows:

agg <- aggregate(values, by = list(values$value), FUN = mean)

The output does result in the mean values of i but I do not think this is the correct way. The output also throws a couple of warnings.

Warning messages:
    1: In mean.default(X[[i]], ...) :
    argument is not numeric or logical: returning NA
  2: In mean.default(X[[i]], ...) :
    argument is not numeric or logical: returning NA
  3: In mean.default(X[[i]], ...) :
    argument is not numeric or logical: returning NA
  > agg
  Group.1 value    i
  1       a    NA  3.0
  2       b    NA  7.0
  3       c    NA 10.5
Suhail Gupta
  • 22,386
  • 64
  • 200
  • 328

2 Answers2

1

Remove the warning message with:

aggregate(values, by = list(values$value), FUN = function(x) mean(as.numeric(x)))

Which returns:

  Group.1 value    i
1       a     1  3.0
2       b     2  7.0
3       c     3 10.5

Alternatively, we could use dot notation:

aggregate(. ~ value, values, mean)

Or use formulas:

aggregate(i ~ value, values, mean)

Both return:

  value    i
1     a  3.0
2     b  7.0
3     c 10.5
tyluRp
  • 4,678
  • 2
  • 17
  • 36
  • What if `value` is associated with a third variable that also repeats itself? For example in `aggregate(AverageTemperature ~ Year, americanCities, mean)`, _Year_ will be associated with multiple cities` . The above formula will give an average of all years (associated with multiple cities). How can I segregate based on cities? _Note: Here year also repeats itself 12 times in one year_ – Suhail Gupta Mar 17 '18 at 07:13
  • 1
    You could use `+`. Something like `aggregate(breaks ~ wool + tension, warpbreaks, mean)` – tyluRp Mar 17 '18 at 07:20
  • Also, is it possible to aggregate based on a condition? For example, I want to aggregate where `year >= 1800 and year < 1900` and second where `year >= 1900`? – Suhail Gupta Mar 17 '18 at 08:06
  • You could subset the data like so, `aggregate(breaks ~ wool + tension, warpbreaks[warpbreaks$wool == "A", ], mean)` – tyluRp Mar 17 '18 at 08:15
0

Here is another easy solution using dplyr:

library(dplyr)

values %>%
  group_by(value) %>%
  summarise(i = mean(i))

# A tibble: 3 x 2
   value     i
  <fctr> <dbl>
1      a   3.0
2      b   7.0
3      c  10.5
tifu
  • 1,352
  • 6
  • 17