1

I want to use dplyr::summarise_at on a collection of columns with a function object that uses an additional variable.

For example, consider the following data frame with two numeric variables and an indicator.

library(dplyr, warn.conflicts = FALSE)

t1 <- data.frame(lg = c(TRUE, FALSE, TRUE, FALSE),
                 x1 = 1:4,
                 x2 = 5:8)

Using dplyr::funs() inside summarise_at works:

t1 %>% summarise_at(c("x1", "x2"), funs(mean(. * lg)))
#>   x1 x2
#> 1  1  3

However, I prefer passing a function object to summarise_at - instead of a call to dplyr::funs() - so that R CMD Check doesn't complain about the unknown variable ..

Unfortunately, when I try to do this, summarise_at can't find the indicator variable lg.

t1 %>% summarise_at(c("x1", "x2"), function(x) mean(x * lg))
#> Error in summarise_impl(.data, dots): Evaluation error: object 'lg' not found.

Consequently, is there a way to pass a function object to summarise_at with extra variables inside the function object in this way?

  • In this specific case using `mean(x * t1$lg)` will work. – Shree Oct 31 '18 at 23:51
  • @AJP123: I don't think it's possible to do that without using either `funs` or `list`. Here is my reasoning https://stackoverflow.com/questions/52730562/mutate-impl-data-dots-evaluation-error-object-not-found/52730673#comment92417716_52730673 – Tung Nov 01 '18 at 05:26
  • 1
    @Tung beautiful. Are you happy to post your comment as an answer and then I can accept it? –  Nov 01 '18 at 22:26
  • @AJP123: Sure. See my answer below – Tung Nov 01 '18 at 22:36

1 Answers1

1

I don't think it's possible to do that without using either funs or list inside summarize_at(). My guess is that when we use the tilde ~ & . / .x or similarly function(x) & x, it creates an anonymous function with its own environment. Thus it only knows about the column variable that is passed through the . or .x but not the other variable in the data frame i.e. lg. Whereas, when we add funs() or list(), it probably inherits the whole environment which makes it aware of every column within the data frame not just the one passed via .

If you want a clearer code, maybe write a function then call it inside summarise_at()?

multiplyx <- function(x, lg){
  result <- mean(x * lg)
  return(result)
}

t1 %>% summarise_at(c("x1", "x2"), funs(multiplyx(., lg)))
#>   x1 x2
#> 1  1  3

t1 %>% summarise_at(c("x1", "x2"), list(~ multiplyx(., lg)))
#>   x1 x2
#> 1  1  3

See also this

Tung
  • 26,371
  • 7
  • 91
  • 115
  • I just asked a similar question [here](https://stackoverflow.com/questions/59185751/dplyr-summarise-with-list-of-function-and-dependence-on-other-data-column). My problem is that I have a list of functions I want to use in `summarize_at`, not just one. I could manually write those functions in the `list(...)` within `summarize_at`, but that would be annoying. Is there a nice way to deal with my case? – Ben Dec 05 '19 at 00:30