1

Working with this data in Rstudio. I need to create a new df with the mean of lwage76 grouped by ed76 and regional.dummies; also, the df needs to contain the sum of observations in those respective groups. This is what I have so far:

agglwage <- aggregate(lwage76 ~ regional.dummies + ed76, nlsdata, mean) 
 head(agglwage )

#    regional.dummies ed76  lwage76
#1                  7    1 6.214608
#2                  6    2 5.682503
#3                  2    3 5.746203

So far so good.

dfcount <- count(nlsdata, c("regional.dummies", "ed76"))                                                                                                                                     
head(dfcount, n=3)

#  regional.dummies ed76 freq
#1                1    7    2
#2                1    9    4
#3                1   10    6

I think I can probably merge the two df's now, but I think there must be a more efficient way of doing this?

Sotos
  • 51,121
  • 6
  • 32
  • 66
Collective Action
  • 7,607
  • 15
  • 45
  • 60
  • 1
    [Relevant link](https://stackoverflow.com/questions/12064202/using-aggregate-to-apply-several-functions-on-several-variables-in-one-call) – Sotos Nov 20 '17 at 10:20

1 Answers1

2

We can use dplyr. After grouping by 'regional.dummies' and 'ed76', get the number of rows (n()) and mean of 'lwage76'

library(dplyr)
nlsdata %>%
      group_by(regional.dummies, ed76) %>%    
      summarise(freq = n(), lwage76 = mean(lwage76, na.rm = TRUE))
akrun
  • 874,273
  • 37
  • 540
  • 662