2

Sticking to library dplyr, I need to calculate weighted average of a variable by subgroups of other variables using column indexes instead of column names. Here is the example:

data <- read.table(text = 'obs income education type weight   
                            1   1000      A     blue    10     
                            2   2000      B     yellow   1     
                            3   1500      B     blue     5     
                            4   2000      A     yellow   2 
                            5   3000      B     yellow   2', 
                   header = TRUE)

Everything goes well using group_by, weighted.mean and mutate when using column names for grouping:

df <-data %>%
     group_by(education,type) %>% 
     mutate(weighted_income = weighted.mean(income, weight))
df
# A tibble: 5 x 6
# Groups:   education, type [4]
    obs income education type   weight weighted_income
  <int>  <int> <fct>     <fct>   <int>           <dbl>
1     1   1000 A         blue       10           1000.
2     2   2000 B         yellow      1           2667.
3     3   1500 B         blue        5           1500.
4     4   2000 A         yellow      2           2000.
5     5   3000 B         yellow      2           2667.

But I need to use column indexes instead of column names. I was able to make group_by_at works but only for 1 group, like this (column 3 = education):

df %>%
   group_by_at(3) %>% 
   mutate(weighted_income = weighted.mean(income, weight))
df
# A tibble: 5 x 6
# Groups:   education [2]
    obs income education type   weight weighted_income
  <int>  <int> <fct>     <fct>   <int>           <dbl>
1     1   1000 A         blue       10           1167.
2     2   2000 B         yellow      1           1938.
3     3   1500 B         blue        5           1938.
4     4   2000 A         yellow      2           1167.
5     5   3000 B         yellow      2           1938.

But I get an error for sub-groups (education = column 3, type= column 4)

df %>%
   group_by_at(3,4) %>% 
   mutate(weighted_income = weighted.mean(income, weight))

Error: Can't create call to non-callable object

How to make this last piece of code work for sub-groups? My query is related to this topic on grouping using column indexes rather column names but the answers only refer to groups, not sub-groups.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Elixterra
  • 281
  • 1
  • 3
  • 11

1 Answers1

2

We need to concatenate the indexes as without it, the group_by_at thinks the '3' as the .vars and the 4 as .funs based on the usage

group_by_at(.tbl, .vars, .funs = list(), ..., .add = FALSE)

Therefore, do a concatenation and it would evaluate it for .vars

data %>% 
   group_by_at(c(3, 4)) %>%
   mutate(weighted_income = weighted.mean(income, weight))
# A tibble: 5 x 6
# Groups: education, type [4]
#    obs income education type   weight weighted_income
#  <int>  <int> <fctr>    <fctr>  <int>           <dbl>
#1     1   1000 A         blue       10            1000
#2     2   2000 B         yellow      1            2667
#3     3   1500 B         blue        5            1500
#4     4   2000 A         yellow      2            2000
#5     5   3000 B         yellow      2            2667

Or we can place it inside vars to notify that it is the .vars

data %>%
   group_by_at(vars(3, 4)) %>% 
   mutate(weighted_income = weighted.mean(income, weight))
# A tibble: 5 x 6
# Groups: education, type [4]
#    obs income education type   weight weighted_income
#  <int>  <int> <fctr>    <fctr>  <int>           <dbl>
#1     1   1000 A         blue       10            1000
#2     2   2000 B         yellow      1            2667
#3     3   1500 B         blue        5            1500
#4     4   2000 A         yellow      2            2000
#5     5   3000 B         yellow      2            2667
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I noticed only today that while `group_by` and `group_by_at` gives the same result when there is only 1 variable/index, the results are different for repeated groupings. The reason seems to be that, contrarily to `group_by`, `group_by_at()` does not override existing grouping correctly. Adding `ungroup() %>% group_by_at()` solve the problem. – Elixterra Apr 08 '18 at 19:09
  • @Elixterra There is some bug in that direction. You are right – akrun Apr 09 '18 at 03:06