I'm looking for a faster way to calculate a group mean with multiple grouping variables while excluding own group values. A thought experiment would be finding average value (e.g. price) for a county from the counties in the same state in the same year excluding own county's value. Here's a toy data set.
df <- data_frame(
state = rep(c("AL", "CA"), each = 6),
county = rep(letters[1:6], each = 2),
year = rep(c(2011:2012), 6),
value = sample.int(100, 12)
)
df %>%
group_by(state, county, year) %>%
summarise(q = mean(df$value[df$state == state & df$county != county & df$year == year]))
# Groups: state, county [6]
state county year q
<chr> <chr> <int> <dbl>
1 AL a 2011 56
2 AL a 2012 46
3 AL b 2011 50.5
4 AL b 2012 52
5 AL c 2011 55.5
6 AL c 2012 29
7 CA d 2011 52.5
8 CA d 2012 32
9 CA e 2011 68.5
10 CA e 2012 31.5
11 CA f 2011 32
12 CA f 2012 42.5
The above code gives me a desired result, but when I apply this to a larger dataset (with more grouping variables) it gets really slow. Do you have any suggestion on how to speed this up?
If the original approach is incorrect, please point that out as well.