I am trying to understand the expected output of dplyr::group_by()
in conjunction with the use of dplyr::all_of()
. My understanding is that using dplyr::all_of()
should convert character vectors containing variable names to the bare names so that group_by()
, but this doesn't appear to happen.
Below, I generate some fake data, pass different objects to group_by()
with(out) all_of()
and calculate the number of observations in each group. In the example, passing a single bare column name without dplyr::all_of()
produces the correct output: one row per unique value of the column. However, passing character vectors or using dplyr::all_of()
produces incorrect output: one row regardless of the number of values in a column.
What is expected when using all_of
and how might I alternatively pass a character vector to group_by
to process as a vector of bare names?
library(dplyr)
# Create a 20-row data.frame with
# 2 variables each with 2 unique values.
df <- data.frame(var = rep(c("a", "b"), 10),
bar = rep(c(1, 2), 20))
# Output 1: 2x2 tibble - GOOD
df %>% group_by(var) %>% summarize(n = n())
# Output 2: 1x2 tibble - BAD
foo <- "var"
df %>% group_by(all_of(foo)) %>% summarize(n = n())
# Output 3: 1x2 tibble
df %>% group_by("var") %>% summarize(n = n())
# Output 4: Error in_var not found - BAD
foo2 <- list("var", "bar")
lapply(foo2, function(in_var) {
df %>%
group_by(in_var) %>%
summarize(n = n())
})
# Output 5: list of length 2 where
# each element is a 1x2 tibble - BAD
foo2 <- list("var", "bar")
lapply(foo2, function(in_var) {
df %>%
group_by(all_of(in_var)) %>%
summarize(n = n())
})