5

My question is similar to this question but I need to apply a more complex function across columns and I can't figure out how to apply Lionel's suggested solution to a custom function with a scoped verb like filter_at() or a filter()+across() equivalent. It doesn't look like a "superstache"/{{{}}} operator has been introduced.

Here is a non-programmed example of what I want to do (doesn't use NSE):

library(dplyr)
library(magrittr)

foo <- tibble(group = c(1,1,2,2,3,3),
              a = c(1,1,0,1,2,2),
              b = c(1,1,2,2,0,1))

foo %>%
  group_by(group) %>%
  filter_at(vars(a,b), any_vars(n_distinct(.) != 1)) %>%
  ungroup
#> # A tibble: 4 x 3
#>   group     a     b
#>   <dbl> <dbl> <dbl>
#> 1     2     0     2
#> 2     2     1     2
#> 3     3     2     0
#> 4     3     2     1

I haven't found an equivalent of this filter_at line with filter+across() yet, but since the new(ish) tidyeval functions predate dplyr 1.0 I assume that issue can be set aside. Here is my attempt to make a programmed version where the filtering variables are user-supplied with dots:

my_function <- function(data, ..., by) {
  dots <- enquos(..., .named = TRUE)
  
  helperfunc <- function(arg) {
    return(any_vars(n_distinct(arg) != length(arg)))
  }
  
  dots <- lapply(dots, function(dot) call("helperfunc", dot))
  
  data %>%
    group_by({{ by }}) %>%
    filter(!!!dots) %>%
    ungroup
}

foo %>%
  my_function(a, b, group)
#> Error: Problem with `filter()` input `..1`.
#> x Input `..1` is named.
#> i This usually means that you've used `=` instead of `==`.
#> i Did you mean `a == helperfunc(a)`?

I'd love if there were a way to just plug in an NSE operator inside the vars() argument in filter_at and not have to make all these extra calls (I assume this is what a {{{}}} function would do?)

lost
  • 1,483
  • 1
  • 11
  • 19

3 Answers3

4

Maybe I'm misunderstanding what the issue is, but the standard pattern of forwarding the dots seems to work fine here:

my_function <- function(data, ..., by) {
  data %>%
    group_by({{ by }}) %>%
    filter_at(vars(...), any_vars(n_distinct(.) != 1)) %>%
    ungroup
}

foo %>%
  my_function( a, b, by=group )     # works
Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74
  • Hadn't realized `ensyms` was needed here--thanks. I have a hard time finding up-to-date guides on NSE functions (`quo`, `enexprs`, ``quo_name`, `as_name`, etc.). The programming vignette used to have more about this but now it seems to mainly be about curly curly. It's also hard to keep track of terminology--it seems like they are moving aware from terms like "quotation" and "defusion" toward "indirection," "masking," and "embracing." But maybe I'm mixing things up. – lost Aug 05 '20 at 03:13
  • Agree with @MrFlick that an `across()` solution would also be interesting – lost Aug 05 '20 at 03:14
  • @MrFlick I'm not sure if there is a direct `across()` equivalent here. Lionel will probably correct me, but I'm pretty sure that `across()` operates on one column at a time. If a custom function needs to operate on multiple columns, I would probably group columns via `nest()` first. – Artem Sokolov Aug 05 '20 at 03:14
  • 1
    @lost Please see the edit. Turns out that `ensyms()` is not even needed here, since you can just forward the dots. A good up-to-date resource is probably the [Tidy evaluation book](https://tidyeval.tidyverse.org/). – Artem Sokolov Aug 05 '20 at 03:15
  • @MrFlick I don't think the issue is in the `.cols` argument of `across()`, but in the fact that functions provided in `.fns` operate on one column at a time. `any_vars()` is [specifically a `filter_at()` construct](https://dplyr.tidyverse.org/reference/filter_all.html), which is where the equivalency breaks. – Artem Sokolov Aug 05 '20 at 03:23
  • I noticed that in the `?filter_at` documentation the `across` translation given for the `any_vars()` example does not seem to produce the same results (I get an empty tibble for the `across` exampe). This may be an unresolved issue with the migration toward `across`. – lost Aug 05 '20 at 03:25
4

Here is a way to use across() to achieve this that is covered in vignette("colwise").

my_function <- function(data, vars, by) {
  
  data %>%
    group_by({{ by }}) %>%
    filter(n_distinct(across({{ vars }}, ~ .x)) != 1) %>%
    ungroup()
  
}
 
foo %>%
  my_function(c(a, b), by = group)

# A tibble: 4 x 3
  group     a     b
  <dbl> <dbl> <dbl>
1     2     0     2
2     2     1     2
3     3     2     0
4     3     2     1
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
  • 1
    Actually, I don't think that `~.x > 0` is quite right. Try `-foo %>% my_function(c(a, b), by = group)` (i.e., negate all values in the data frame). – Artem Sokolov Aug 05 '20 at 03:45
2

An option with across

my_function <- function(data, by, ...) {
 
  dots <- enquos(..., .named = TRUE)
  nm1 <- purrr::map_chr(dots, rlang::as_label) 
     
     
  data %>%
    dplyr::group_by({{ by }}) %>%
    dplyr::mutate(across(nm1, ~ n_distinct(.) !=1, .names = "{col}_ind")) %>%
    dplyr::ungroup() %>% 
    dplyr::filter(dplyr::select(., ends_with('ind')) %>% purrr::reduce(`|`)) %>%
    dplyr::select(-ends_with('ind'))
    
    
}

my_function(foo, group, a, b)
# A tibble: 4 x 3
#  group     a     b
#  <dbl> <dbl> <dbl>
#1     2     0     2
#2     2     1     2
#3     3     2     0
#4     3     2     1

Or with filter/across

foo %>%
   group_by(group) %>%
   filter(any(!across(c(a,b), ~ n_distinct(.) == 1)))
# A tibble: 4 x 3
# Groups:   group [2]
#  group     a     b
#  <dbl> <dbl> <dbl>
#1     2     0     2
#2     2     1     2
#3     3     2     0
#4     3     2     1
akrun
  • 874,273
  • 37
  • 540
  • 662