1

I'm trying to get group "weighted" mean with multiple grouping variables and excluding own group value. This is related to my earlier post Get group mean with multiple grouping variables and excluding own group value, but when I applied it to my actual question (which is getting the weighted mean) I found out that it's much more complicated than getting the simple mean. Here's what I mean by that.

df <- data_frame(
  state = rep(c("AL", "CA"), each = 6),
  county = rep(letters[1:6], each = 2),
  year = rep(c(2011:2012), 6),
  value = c(91,46,37,80,33,97,4,19,85,90,56,94),
  wt = c(1,4,3,5,1,4,5,1,5,5,4,1)
) %>% arrange(state, year)

For unweighted mean case, the following code (from the accepted answer of my earlier post) should work.

df %>%
  group_by(state, year) %>%
  mutate(q = (sum(value) - value) / (n()-1))

The desired variable new_val, which is the weighted mean, would be the following. For instance, the first two rows of new_val column are calculated as 37*3/4 + 33*1/4 = 36, 91*1/2 + 33*1/2 = 62.

# A tibble: 12 x 6
   state county  year value    wt new_val
   <chr> <chr>  <int> <dbl> <dbl>   <dbl>
 1 AL    a       2011    91     1    36  
 2 AL    b       2011    37     3    62
 3 AL    c       2011    33     1    50.5
 4 AL    a       2012    46     4    87.6  
 5 AL    b       2012    80     5    71.5
 6 AL    c       2012    97     4    64.9
 7 CA    d       2011     4     5    72.1
 8 CA    e       2011    85     5    27.1
 9 CA    f       2011    56     4    44.5
10 CA    d       2012    19     1    90.7
11 CA    e       2012    90     5    56.5
12 CA    f       2012    94     1    78.2

I searched for similar posts with weighted mean in mind, but all the available ones were for the simple mean cases. Any comments would be greatly appreciated. Thank you!

qnp1521
  • 806
  • 6
  • 20

1 Answers1

1

We can use map_dbl to exclude current row in the calculation of weighted.mean

library(dplyr)

df %>%
  group_by(state, year) %>%
  mutate(new_val = purrr::map_dbl(row_number(), 
                         ~weighted.mean(value[-.x], wt[-.x])))


#   state county  year value    wt new_val
#   <chr> <chr>  <int> <dbl> <dbl>   <dbl>
# 1 AL    a       2011    91     1    36  
# 2 AL    b       2011    37     3    62  
# 3 AL    c       2011    33     1    50.5
# 4 AL    a       2012    46     4    87.6
# 5 AL    b       2012    80     5    71.5
# 6 AL    c       2012    97     4    64.9
# 7 CA    d       2011     4     5    72.1
# 8 CA    e       2011    85     5    27.1
# 9 CA    f       2011    56     4    44.5
#10 CA    d       2012    19     1    90.7
#11 CA    e       2012    90     5    56.5
#12 CA    f       2012    94     1    78.2
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213