Calculate mean for subset of column

Question

I can't figure out how to calculate the mean for a subset of a column in R. My particular question is calculating "expenditures" for "age" 40+ and <40. I've tried

mean(expenditures[["age">=40]])

and gotten success, but

mean(expenditures[["age"<40]])

was not successful.

I am therefore stuck on this problem. I'll greatly appreciate any help on this seemingly simple question.

You got success with `"age">=40` because by itself it returns `TRUE`. And you have computed the mean of the entire vector. The right way would be `i <- expenditures[["age"]] >= 40; mean(expenditures[["age"]][i])` and `mean(expenditures[["age"]][!i])`. — Rui Barradas, Aug 21 '18 at 04:44
Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — Ronak Shah, Aug 21 '18 at 05:18
What is the name of your dataset? `expenditures`? Please edit **the question** with the output of `dput(head(data, 20))` replacing `data` by the name of your dataset. — Rui Barradas, Aug 21 '18 at 06:39
*"Why does "age" >= 40 return TRUE?"* Because the ASCII code for "a" is 97 and for "4" is 52. — Rui Barradas, Aug 21 '18 at 07:19

Paul · Answer 1 · 2018-08-21T06:14:18.447

2

You could do it in one hit by mutating a group column, group_by() that column and use summarise() to calculate the mean:

library(dplyr)

data("mtcars")

mtcars %>%
  group_by(group = ifelse(hp > 100, "> 100", "<= 100")) %>%
  summarise(mean = mean(hp))

gives:

# A tibble: 2 x 2
  group   mean
  <chr>  <dbl>
1 <= 100  76.3
2 > 100   174.

Note: Thanks Tino for the tips!

edited Aug 21 '18 at 06:14

answered Aug 21 '18 at 04:51

Paul

2,877
1
12
28

score 1 · Answer 2 · answered Aug 21 '18 at 06:07

If you don't want to use additional packages:

# some sample data:
set.seed(123)
df <- data.frame(age = sample(x = 20:50, size = 100, replace = TRUE),
                 expenditures = runif(n = 100, min = 100, max = 1000))

aggregate(
  formula = expenditures ~ age >= 40,
  data = df,
  FUN = mean
)

And to add to Paul's solution, you could also create the group within group_by:

library(dplyr)
# using dplyr:
df %>% 
  group_by(age >= 40) %>% 
  summarise_at(.vars = vars(expenditures), mean)

Calculate mean for subset of column

2 Answers2