2

I'm trying to write a function, using dplyr syntax, which includes grouping with group_by inside the function. There seems to be a problem with the group_by statement, and I can't figure out whats wrong. When I pass abc as an argument and using select inside the function, it works as i would have expected (Gfunc1). When trying to group_by the same argument, it gives me an error;

Error: Column dims is unknown

Please see exampel below. I really hope I have not overlooked some embarrassingly simple thing... anyway, would be gratefull for help!

library(dplyr)


abc <- c("a","a","a","b","b","c")
num <- c(1,2,3,4,5,6) 
df <- data.frame(abc,num)


Gfunc1 <- function(dims) {
test1 <- df %>% 
    select(dims)
assign("test1", test1, envir = .GlobalEnv)
}

Gfunc2 <- function(dims) {
test2 <- df %>% 
  group_by(dims)

assign("test2", test2, envir = .GlobalEnv)
}

Gfunc1("abc") 
# Returns as expected; df test1 with only col = "abc"

Gfunc2("abc")
# Does not return what i expect; gives error:  Error: Column `dims` is unknown 
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
martinlj
  • 33
  • 4

2 Answers2

3

One can solve this by using {{}}(I'm using rlang 0.4.1, dplyr 0.8.3) as follows.

The issue is that one needs to do a bit of extra work when writing functions that depend on dplyr. This is often done via tidy evaluation/Non Standard Evaluation(NSE). I added df as an argument because I feel it is always better to provide the dataset as an argument rather than calling it from an external environment. Why Gfunc1 works is to do with select being more robust unlike other dplyr functions:

Gfunc2 <- function(df = NULL,dims) {
  test2 <- df %>% 
    group_by({{dims}})

  assign("test2", test2, envir = .GlobalEnv)
}

For earlier versions of rlang and dplyr, the same can be achieved using sym and !!:

Gfunc2 <- function(df = NULL,dims) {
  test2 <- df %>% 
    group_by(!!sym(dims))

  assign("test2", test2, envir = .GlobalEnv)
}
Gfunc2(df,"abc")

NOTE

  1. It is almost always better to store results in a list instead of sending them to .GlobalEnv.
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
  • 1
    Good solution. Mine covers a new approach that Rstudio shared recently – Johan Rosa Nov 12 '19 at 14:08
  • 1
    Works perfectly, thank you. Did not know that about dplyr i functions... both that and the !!sum() and the {{}} are real key pieces not only to this particular question, but overall. All this PLUS outing in list instead of changing envir == to service. Many thanks – martinlj Nov 18 '19 at 13:40
1

You can create a function by passing the dots to it. This way you can group by and select more than one variable at the time using NSE.

Gfunc1 <- function(.df, ...) {
  test1 <- .df %>%
    select(...)

  assign("test1", test1, envir = .GlobalEnv)
}

Gfunc2 <- function(.df, ...) {

    test2 <- .df %>%
      group_by(...)

    assign("test2", test2, envir = .GlobalEnv)
  }

 Gfunc1(df, abc)
 Gfunc2(df, abc)

results

> test1
  abc
1   a
2   a
3   a
4   b
5   b
6   c

test2 %>%
   summarise(sum = sum(num))

  abc     sum
  <fct> <dbl>
1 a         6
2 b         9
3 c         6

To see more about this, consider the material from the RstudioConf selecting and doing with the Tidy Eval - slides - video

Johan Rosa
  • 2,797
  • 10
  • 18
  • Hey Johan, I have not tried your approach yet, but will do so next time I get the opportunity. I am sure it works perfectly. Thank you so much for answering! – martinlj Nov 18 '19 at 13:46