2

I am trying to use group_by within a function call in dplyr (R) and I am getting unexpected results. Here is an example of what I am trying to do:

df = data.frame(a = c(0,0,1,1), b = c(0,1,0,1), c = c(1,2,3,4))

result1 = df %>%
  group_by(a,b) %>%
  mutate(d = sum(c))
result1$d

myFunc <- function(df, var) {
  output = df %>%
    group_by(a,!!var) %>%
    mutate(d = sum(c))
  return(output)
}

result2 = myFunc(df,"b")
result2$d

result1$d yields [1,2,3,4] which is what I expected. result2$d yields [3,3,7,7] which I do not want, and I am not sure what is going on.

It works to have b (without quotes) as the function argument, and {{var}} in place of !!var. Unfortunately, in my case, my column names are in string format (but maybe there is a way to transform the string beforehand so that it will work with the {{}} notation?)

David Mao
  • 43
  • 4
  • I believe you can use `get()`: `group_by(a, get(var))` – VvdL Jul 22 '22 at 06:37
  • @VvdL yes, but you will get an additional column named `\`get(var)\`` in the output. That's not perfect. – Darren Tsai Jul 22 '22 at 06:57
  • 2
    Does this answer your question? [How to pass column name as argument to function for dplyr verbs?](https://stackoverflow.com/questions/67382081/how-to-pass-column-name-as-argument-to-function-for-dplyr-verbs) – user438383 Jul 22 '22 at 07:18

3 Answers3

3

If you want to pass a character object that can refer to a certain column of a data frame, you should use !!sym(var):

myFunc <- function(df, var) {
  output = df %>%
    group_by(a, !!sym(var)) %>%
    mutate(d = sum(c))
  return(output)
}

myFunc(df, "b")

If you want to pass a data-masked argument, you should use {{ var }} or equivalently !!enquo(var):

myFunc <- function(df, var) {
  output = df %>%
    group_by(a, {{ var }}) %>%
    mutate(d = sum(c))
  return(output)
}

myFunc(df, b)

Note that I pass "b" and b respectively into the function in the two different cases.

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
2

If we want to use quoting and unquoting instead of curlycurly {{}} the we should consider this basic procedure: https://tidyeval.tidyverse.org/dplyr.html

Creating a function around dplyr pipelines involves three steps: abstraction, quoting, and unquoting.

1. Abstraction step:

  • Here we identify the varying steps. In our case var in group_by:

2. Quoting step:

  • Identify all the arguments where the user is allowed to refer to data frame columns directly.
  • The function can’t evaluate these arguments right away.
  • Instead they should be automatically quoted. Apply enquo() to these arguments

3. Unquoting step:

  • Identify where these variables are passed to other quoting functions and unquote with !!.
  • In this case we pass var to group_by():
myFunc <- function(df, var) {
  var <- enquo(var)
  output = df %>%
    group_by(a,!!var) %>%
    mutate(d = sum(c))
  return(output)
}

result2 = myFunc(df,b)

output:

[1] 1 2 3 4
TarJae
  • 72,363
  • 6
  • 19
  • 66
0

Just as I post a question, I come across something that works...

myFunc <- function(df, var) {
  output = df %>%
    group_by_at(.vars = c("a",var)) %>%
    mutate(d = sum(c))
  return(output)
}

result2 = myFunc(df,"b")
David Mao
  • 43
  • 4