1

Using R 3.5, R studio 1.1.419.

I have a dataset that contains geographic data and a city based measures.

    zip state        city statefips finmei14 finmei15
1   501    NY  Holtsville        36       NA       NA
2   544    NY  Holtsville        36       NA       NA
3  1001    MA      Agawam        25       NA       NA
4  1002    MA     Amherst        25       69       64
5  1003    MA     Amherst        25       69       64
6  1004    MA     Amherst        25       69       64
7  1005    MA       Barre        25       NA       NA
8  1007    MA Belchertown        25       NA       NA
9  1008    MA   Blandford        25       NA       NA
10 1009    MA  Bondsville        25       NA       NA

finmei14 and finmei15 are city based measures that I want to aggregate by state so there will be a new variable that takes the average of the city measures into a new state measure (stat14 and stat15). I tried to use group_by and mutate

testdat %>% 
  group_by(state) %>%
  mutate (stat14=mean(finmei14))

My result was stat14 having nothing but NA's.

    zip state        city statefips finmei14 finmei15 stat14
1   501    NY  Holtsville        36       NA       NA     NA
2   544    NY  Holtsville        36       NA       NA     NA
3  1001    MA      Agawam        25       NA       NA     NA
4  1002    MA     Amherst        25       69       64     NA
5  1003    MA     Amherst        25       69       64     NA
6  1004    MA     Amherst        25       69       64     NA
7  1005    MA       Barre        25       NA       NA     NA
8  1007    MA Belchertown        25       NA       NA     NA
9  1008    MA   Blandford        25       NA       NA     NA
10 1009    MA  Bondsville        25       NA       NA     NA

I've also tried to add na.rm=TRUE but it returned this result

testdat %>% 
  group_by(state) %>%
  mutate (stat14=mean(finmei14), na.rm=TRUE)

     zip state city        statefips finmei14 finmei15 stat14 na.rm
 1  501. NY    Holtsville        36.      NA       NA      NA TRUE 
 2  544. NY    Holtsville        36.      NA       NA      NA TRUE 
 3 1001. MA    Agawam            25.      NA       NA      NA TRUE 
 4 1002. MA    Amherst           25.      69.      64.     NA TRUE 
 5 1003. MA    Amherst           25.      69.      64.     NA TRUE 
 6 1004. MA    Amherst           25.      69.      64.     NA TRUE 
 7 1005. MA    Barre             25.      NA       NA      NA TRUE 
 8 1007. MA    Belchertown       25.      NA       NA      NA TRUE 
 9 1008. MA    Blandford         25.      NA       NA      NA TRUE 
10 1009. MA    Bondsville        25.      NA       NA      NA TRUE 

What I want to see (based on this sample) is

    zip state        city statefips finmei14 finmei15 stat14
1   501    NY  Holtsville        36       NA       NA     NA
2   544    NY  Holtsville        36       NA       NA     NA
3  1001    MA      Agawam        25       NA       NA     69
4  1002    MA     Amherst        25       69       64     69
5  1003    MA     Amherst        25       69       64     69
6  1004    MA     Amherst        25       69       64     69
7  1005    MA       Barre        25       NA       NA     69
8  1007    MA Belchertown        25       NA       NA     69
9  1008    MA   Blandford        25       NA       NA     69
10 1009    MA  Bondsville        25       NA       NA     69

Thank you.

Elombardi
  • 15
  • 4
  • 2
    no problem here using `mutate(stat14=mean(finmei14, na.rm = T))`. Please be aware that the `na.rm = TRUE` must be set within the `mean()` function. – Roman Apr 26 '18 at 14:26
  • 2
    You have to give the `na.rm = T` argument to `mean` not to `mutate` – divibisan Apr 26 '18 at 14:28

0 Answers0