2

I am trying to add an aggregated column to a data frame using dplyr. Here is an example of what I have in mind:

gender <- c("male", "female", "male")
age <- c(25, 30, 56)
weight <- c(160, 110, 220)
mydata <- data.frame(gender, age, weight)

I group the data frame mydata by gender before making an aggregated calculation to find the average weight by gender:

library(dplyr)
mydata <- group_by(mydata, gender)
mydata2 <- summarise(mydata, wt=mean(weight))

Is there any way of adding the column of average weight to the original data frame in the same step as above? In SQL, I would achieve this using the following line of code:

SELECT gender, age, weight, avg(weight) as avg_wt FROM mydata GROUP BY gender

I realize this is a very basic question, but I am new to R and I can't seem to find the answer anywhere.

udden2903
  • 783
  • 6
  • 15

2 Answers2

6

Use mutate instead of summarise:

mydata %>% group_by(gender) %>% mutate(wt = mean(weight))


#Source: local data frame [3 x 4]
#Groups: gender [2]
#
#  gender   age weight    wt
#  (fctr) (dbl)  (dbl) (dbl)
#1   male    25    160   190
#2 female    30    110   110
#3   male    56    220   190
Sumedh
  • 4,835
  • 2
  • 17
  • 32
1

In case it is of interest, this can be accomplished in base R using the ave function:

mydata$avg_wt <- ave(mydata$weight, mydata$gender, FUN=mean)

The first argument is the variable to which the function will be applied, the second, the grouping variable, while the third is the function that will be applied by group.

 mydata
  gender age weight avg_wt
1   male  25    160    190
2 female  30    110    110
3   male  56    220    190

Note that the default value of the FUN argument is "mean", so it is possible to shorten the above code to

mydata$avg_wt <- ave(mydata$weight, mydata$gender)

I added this argument in the answer above as an indication that, despite its name, ave is capable of calculating group other statistics (including user-written functions).

lmo
  • 37,904
  • 9
  • 56
  • 69
  • @DavidArenburg Thanks. You are right. I like to add it into the answer as a reminder that, despite its name, that you can use it to calculate other functions. I'll make a note of this in my answer. – lmo Jul 26 '16 at 21:39