I would like a function to be able to accept a tibble and a character vector indicating the column names of a variable number of columns in that tibble, and perform some operations such as group_by on it.
Here is an example that does it for 0, 1, or 2 columns:
library(tidyverse)
ex = crossing(abc=LETTERS[1:3], xyz=LETTERS[24:26]) %>% mutate(n = row_number())
group_flexibly = function(tbl, group_by_cols=character(0)) {
if (length(group_by_cols)==0) {
tbl %>%
summarize(.groups='keep', mean_n = mean(n))
} else if (length(group_by_cols)==1) {
tbl %>%
group_by(!!as.name(group_by_cols[1])) %>%
summarize(.groups='keep', mean_n=mean(n))
} else if (length(group_by_cols)==2) {
tbl %>%
group_by(!!as.name(group_by_cols[1]), !!as.name(group_by_cols[2])) %>%
summarize(.groups='keep', mean_n=mean(n))
}
}
group_flexibly(ex)
group_flexibly(ex, 'abc')
group_flexibly(ex, 'xyz')
group_flexibly(ex, c('abc','xyz'))
Output is as desired:
> group_flexibly(ex)
# A tibble: 1 × 1
mean_n
<dbl>
1 5
> group_flexibly(ex, 'abc')
# A tibble: 3 × 2
# Groups: abc [3]
abc mean_n
<chr> <dbl>
1 A 2
2 B 5
3 C 8
> group_flexibly(ex, 'xyz')
# A tibble: 3 × 2
# Groups: xyz [3]
xyz mean_n
<chr> <dbl>
1 X 4
2 Y 5
3 Z 6
> group_flexibly(ex, c('abc','xyz'))
# A tibble: 9 × 3
# Groups: abc, xyz [9]
abc xyz mean_n
<chr> <chr> <dbl>
1 A X 1
2 A Y 2
3 A Z 3
4 B X 4
5 B Y 5
6 B Z 6
7 C X 7
8 C Y 8
9 C Z 9
So far so good. Now, how to write such a function that does this for a character vector of arbitrary length?
Here are two things that do not work:
group_by_cols = c('abc','xyz')
ex %>% group_by(!!as.name(group_by_cols)) %>% summarize(.groups='keep', mean_n=mean(n))
ex %>% group_by({{group_by_cols}}) %>% summarize(.groups='keep', mean_n=mean(n))
Problems encountered so far:
!!as.name(group_by_cols)
only usesgroup_by_cols[1]
and ignores the rest of the vector.{{group_by_cols}}
throws an error if length(group_by_cols) != 1.- Popular StackOverflow discussions such as this do not address a need for the length of the vector of column names to be variable.