0

I can't figure out how to calculate the mean for a subset of a column in R. My particular question is calculating "expenditures" for "age" 40+ and <40. I've tried

mean(expenditures[["age">=40]]) 

and gotten success, but

mean(expenditures[["age"<40]]) 

was not successful.

I am therefore stuck on this problem. I'll greatly appreciate any help on this seemingly simple question.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    You got success with `"age">=40` because by itself it returns `TRUE`. And you have computed the mean of the entire vector. The right way would be `i <- expenditures[["age"]] >= 40; mean(expenditures[["age"]][i])` and `mean(expenditures[["age"]][!i])`. – Rui Barradas Aug 21 '18 at 04:44
  • Why does "age" >= 40 return as TRUE? Could I also do – quitethenovelty Aug 21 '18 at 05:09
  • i <- expenditures[["age"]] < 40 – quitethenovelty Aug 21 '18 at 05:10
  • Also, what you said didn't work... – quitethenovelty Aug 21 '18 at 05:15
  • 3
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Ronak Shah Aug 21 '18 at 05:18
  • What is the name of your dataset? `expenditures`? Please edit **the question** with the output of `dput(head(data, 20))` replacing `data` by the name of your dataset. – Rui Barradas Aug 21 '18 at 06:39
  • *"Why does "age" >= 40 return TRUE?"* Because the ASCII code for "a" is 97 and for "4" is 52. – Rui Barradas Aug 21 '18 at 07:19

2 Answers2

2

You could do it in one hit by mutating a group column, group_by() that column and use summarise() to calculate the mean:

library(dplyr)

data("mtcars")

mtcars %>%
  group_by(group = ifelse(hp > 100, "> 100", "<= 100")) %>%
  summarise(mean = mean(hp))

gives:

# A tibble: 2 x 2
  group   mean
  <chr>  <dbl>
1 <= 100  76.3
2 > 100   174.

Note: Thanks Tino for the tips!

Paul
  • 2,877
  • 1
  • 12
  • 28
1

If you don't want to use additional packages:

# some sample data:
set.seed(123)
df <- data.frame(age = sample(x = 20:50, size = 100, replace = TRUE),
                 expenditures = runif(n = 100, min = 100, max = 1000))

aggregate(
  formula = expenditures ~ age >= 40,
  data = df,
  FUN = mean
)

And to add to Paul's solution, you could also create the group within group_by:

library(dplyr)
# using dplyr:
df %>% 
  group_by(age >= 40) %>% 
  summarise_at(.vars = vars(expenditures), mean)
Tino
  • 2,091
  • 13
  • 15