1

I have a workflow where I supply a vector of strings representing column names to a function that uses group_by on those columns. It works when I test it with one column name, but fails when I pass it multiples.

The basic setup is this:

group_summs <- function(df, grouping_vars) {

  if(length(grouping_vars == 1)) {

    group_var <- ensym(grouping_vars)

    df %>%
      group_by(!! group_var) %>% 
      summarise(n_test = n())

  } else {

    group_vars <- grouping_vars

    df %>% 
      group_by_at(.vars = group_vars) %>% 
      summarise(n_test = n())

  }
}

#Single column test
flights <- nycflights13::flights
col_test <- c("origin")

#This Works
group_summs(flights, col_test)

#Multiple columns test
col_test_2 <- c("origin", "carrier")

#This fails
group_summs(flights, col_Test_2)

So as a test I can pass a single column name and have it run, but when I run it with multiples I get an rlang error.

"Error: Only strings can be converted to symbols Call rlang::last_error() to see a backtrace Called from: rlang::abort(x)"

What I really don't get is why the multiple column example runs correctly outside of the function as in:

#Runs just fine
col_test_2 <- c("origin", "carrier")
flights %>% group_by_at(.vars = col_test_2) %>% summarise(n_test = n())

Is there something about the function environment that I am not understanding, or is this a buggy behavior?

I am using dplyr (0.8.3) and rlang (0.4.0).

This question is very similar to Group by multiple columns in dplyr, using string vector input but the solutions on that question result in the same error so I wonder if there is now a more recent solution (Their current solution from 2017).

Adam Kemberling
  • 301
  • 1
  • 11

1 Answers1

1

The condition is not correct

length(grouping_vars == 1)

It should be

length(grouping_vars) == 1

-fullcode

group_summs <- function(df, grouping_vars) {

  if(length(grouping_vars) == 1) {

    group_var <- ensym(grouping_vars)

    df %>%
      group_by(!! group_var) %>% 
      summarise(n_test = n())

  } else {

    group_vars <- grouping_vars

    df %>% 
      group_by_at(.vars = group_vars) %>% 
      summarise(n_test = n())

  }
}

group_summs(flights, col_test_2)
# A tibble: 35 x 3
# Groups:   origin [3]
#   origin carrier n_test
#   <chr>  <chr>    <int>
# 1 EWR    9E        1268
# 2 EWR    AA        3487
# 3 EWR    AS         714
# 4 EWR    B6        6557
# 5 EWR    DL        4342
# 6 EWR    EV       43939
# 7 EWR    MQ        2276
# 8 EWR    OO           6
# 9 EWR    UA       46087
#10 EWR    US        4405
# … with 25 more rows

group_summs(flights, col_test)
# A tibble: 3 x 2
#  origin n_test
#  <chr>   <int>
#1 EWR    120835
#2 JFK    111279
#3 LGA    104662

However, the condition is not at all required as the group_by_at can length >=1

group_summs2 <- function(df, grouping_vars) {


    group_vars <- grouping_vars

    df %>% 
      group_by_at(.vars = group_vars) %>% 
      summarise(n_test = n())


}



group_summs2(flights, col_test)
# A tibble: 3 x 2
#  origin n_test
#  <chr>   <int>
#1 EWR    120835
#2 JFK    111279
#3 LGA    104662

group_summs2(flights, col_test_2)
# A tibble: 35 x 3
# Groups:   origin [3]
#   origin carrier n_test
#   <chr>  <chr>    <int>
# 1 EWR    9E        1268
# 2 EWR    AA        3487
# 3 EWR    AS         714
# 4 EWR    B6        6557
# 5 EWR    DL        4342
# 6 EWR    EV       43939
# 7 EWR    MQ        2276
# 8 EWR    OO           6
# 9 EWR    UA       46087
#10 EWR    US        4405
# … with 25 more rows
akrun
  • 874,273
  • 37
  • 540
  • 662
  • So since the solution (thanks by the way) is almost irrelevant to the title and setup of the question, is it beneficial to keep it up the way it is? – Adam Kemberling Aug 19 '19 at 20:28
  • 1
    @adamkemberling You can keep it because the solution gives multiple ways to solve it. It will help in clarifying some conceptss – akrun Aug 19 '19 at 20:29