Sticking to library dplyr
, I need to calculate weighted average of a variable by subgroups of other variables using column indexes instead of column names. Here is the example:
data <- read.table(text = 'obs income education type weight
1 1000 A blue 10
2 2000 B yellow 1
3 1500 B blue 5
4 2000 A yellow 2
5 3000 B yellow 2',
header = TRUE)
Everything goes well using group_by
, weighted.mean
and mutate
when using column names for grouping:
df <-data %>%
group_by(education,type) %>%
mutate(weighted_income = weighted.mean(income, weight))
df
# A tibble: 5 x 6
# Groups: education, type [4]
obs income education type weight weighted_income
<int> <int> <fct> <fct> <int> <dbl>
1 1 1000 A blue 10 1000.
2 2 2000 B yellow 1 2667.
3 3 1500 B blue 5 1500.
4 4 2000 A yellow 2 2000.
5 5 3000 B yellow 2 2667.
But I need to use column indexes instead of column names. I was able to make group_by_at
works but only for 1 group, like this (column 3 = education):
df %>%
group_by_at(3) %>%
mutate(weighted_income = weighted.mean(income, weight))
df
# A tibble: 5 x 6
# Groups: education [2]
obs income education type weight weighted_income
<int> <int> <fct> <fct> <int> <dbl>
1 1 1000 A blue 10 1167.
2 2 2000 B yellow 1 1938.
3 3 1500 B blue 5 1938.
4 4 2000 A yellow 2 1167.
5 5 3000 B yellow 2 1938.
But I get an error for sub-groups (education = column 3, type= column 4)
df %>%
group_by_at(3,4) %>%
mutate(weighted_income = weighted.mean(income, weight))
Error: Can't create call to non-callable object
How to make this last piece of code work for sub-groups? My query is related to this topic on grouping using column indexes rather column names but the answers only refer to groups, not sub-groups.