Determine the average value of a variable for a specific level in R

Question

I am trying to find the average value (mean) of a variable specific to a level I assigned a different variable.

So far I created a new variable with the various levels associated to it:

level 1: values <= 0%,
level 2: values < 1%, and
level 3: values >= 1%.

pincome$income_growth <- ifelse(pincome$incomechng <= 0, "level 1",
                                ifelse(pincome$incomechng < 1,"level 2","level 3"))

Now I want to determine the average of another variable associated with the levels above (e.g. average income for level 1 (income growth less than 0%).

I hope this makes sense, I'm very much a novice to R and trying to get the hang of it!

I'm guessing the right way is something like `with(DF, ave(v, level))` or `with(DF, tapply(v, level))` where `DF` is your data.frame, `v` is your variable and `level` is your grouping variable. To learn more, type `?ave` and `?tapply`. — Frank, Sep 27 '16 at 01:20

score 0 · Answer 1 · edited May 23 '17 at 12:33

Try by (?by) if you want base R. If you start doing more complicated things, the plyr/dplyr packages are pretty amazing, and if you are going to muck around with huge datasets and don't mind a bit more of an initial learning curve, the data.table package is also amazing.

A reproducible example would be fantastic.

E.g.

set.seed(1) # so your random numbers are the same as mine
pincome <- data.frame(incomechng = runif(20, min=-1, max=3))

# what you had was fine too; using ?cut is another way to do it
# have just put it in for demonstration purposes.
# though `cut` uses intervals like (a, b] or [a, b) whereas yours
#  are (-Inf, 0] (0, 1) [1, Inf) which is a little different.    
pincome$income_growth <- cut(pincome$incomechng,
                             breaks=c(-Inf, 0, 1, Inf),
                             labels=paste("level", 1:3))

Now we can take the average within each group. I've shown three options; I'm sure there are more.

# base R ?by
by(pincome$incomechng, pincome$income_growth, mean)
# pincome$income_growth: level 1
# [1] -0.6848674
# ------------------------------------------
# pincome$income_growth: level 2
# [1] 0.4132334
# ------------------------------------------
# pincome$income_growth: level 3
# [1] 1.772039

# plyr (dplyr has pipe syntax you may prefer but is otherwise the same)
library(plyr)
ddply(pincome, .(income_growth), summarize, avgIncomeGrowth=mean(incomechng))
#   income_growth avgIncomeGrowth
# 1       level 1      -0.6848674
# 2       level 2       0.4132334
# 3       level 3       1.7720395

# data.table
library(data.table)
setDT(pincome)
pincome[, list(avgIncomeGrowth=mean(incomechng)), by=income_growth]
#    income_growth avgIncomeGrowth
# 1:       level 2       0.4132334
# 2:       level 3       1.7720395
# 3:       level 1      -0.6848674

score 0 · Answer 2 · answered Apr 20 '19 at 18:30

If you'd like a tidyverse solution:

library(tidyverse)
pincome %>%
 mutate(income_growth = case_when(incomechng <= 0 ~ "level 1",
                                  incomechng < 1 ~ "level 2",
                                  TRUE ~ "level 3")) %>%
 group_by(income_growth) %>%
 summarize(avgIncomeGrowth = mean(incomechng,na.rm=TRUE))

Determine the average value of a variable for a specific level in R

2 Answers2