How do I aggregate by group and add the column to the dataframe?

Question

Working with this data in Rstudio. I need to create a new df with the mean of lwage76 grouped by ed76 and regional.dummies; also, the df needs to contain the sum of observations in those respective groups. This is what I have so far:

agglwage <- aggregate(lwage76 ~ regional.dummies + ed76, nlsdata, mean) 
 head(agglwage )

#    regional.dummies ed76  lwage76
#1                  7    1 6.214608
#2                  6    2 5.682503
#3                  2    3 5.746203

So far so good.

dfcount <- count(nlsdata, c("regional.dummies", "ed76"))                                                                                                                                     
head(dfcount, n=3)

#  regional.dummies ed76 freq
#1                1    7    2
#2                1    9    4
#3                1   10    6

I think I can probably merge the two df's now, but I think there must be a more efficient way of doing this?

[Relevant link](https://stackoverflow.com/questions/12064202/using-aggregate-to-apply-several-functions-on-several-variables-in-one-call) — Sotos, Nov 20 '17 at 10:20

score 2 · Accepted Answer · answered Nov 20 '17 at 10:19

We can use dplyr. After grouping by 'regional.dummies' and 'ed76', get the number of rows (n()) and mean of 'lwage76'

library(dplyr)
nlsdata %>%
      group_by(regional.dummies, ed76) %>%    
      summarise(freq = n(), lwage76 = mean(lwage76, na.rm = TRUE))

How do I aggregate by group and add the column to the dataframe?

1 Answers1