R Quasiquotation & tidyeval for dynamic variable references in R in own functions

Question

I'm trying to get my head around using quasiquotation from the tidyverse in R in my own functions. I've read this one here: Passing a list of arguments to a function with quasiquotation and the whole thing here: https://tidyeval.tidyverse.org/

But I still don't get it to work.

Assume I have the following data:

dat <- data.frame(time   = runif(20),
                  group1 = rep(1:2, times = 10),
                  group2 = rep(1:2, each = 10),
                  group3 = rep(3:4, each = 10))

What I want to do now is to write a function that does the following:

take a data set
specify the variable that contains the time (note, in another data set this might be called "hours" or "qtime" or whatever)
specify by which groups I want to do operations/statistics on

So what I want the user to do is to use a function like:

test_function(data = dat, time_var = "time", group_vars = c("group1", "group3")) Note, I might choose different grouping variables or none next time.

Let's say within the function I want to:

calculate certain statistics on the time variable, e.g. the quantiles. Note: I want to split this up by my grouping variables

Here's one of my latest tries:

test_function <- function(data, time_var = NULL, group_vars = NULL)
{
# Note I initialize the variables with NULL, since e.g. the user might not specify a grouping 

and I want to check for that in my function at some point)
time_var <- enquo(time_var)
group_vars <- enquos(group_vars)

# Here I try to group by my grouping variables
temp_data <- data %>%
    group_by_at(group_vars) %>%
    mutate(!!sym(time_var) := !!sym(time_var) / 60)

# Here I'm calculating some stats  
time_stats <- temp_data %>%
    summarize_at(vars(!!time_var), list(p0.1_time   = ~quantile(., probs = 0.1, na.rm = T),
                                        p0.2_time   = ~quantile(., probs = 0.2, na.rm = T),
                                        p0.3_time   = ~quantile(., probs = 0.3, na.rm = T),
                                        p0.4_time   = ~quantile(., probs = 0.4, na.rm = T),
                                        p0.5_time   = ~quantile(., probs = 0.5, na.rm = T),
                                        p0.6_time   = ~quantile(., probs = 0.6, na.rm = T),
                                        p0.7_time   = ~quantile(., probs = 0.7, na.rm = T),
                                        p0.8_time   = ~quantile(., probs = 0.8, na.rm = T),
                                        p0.9_time   = ~quantile(., probs = 0.9, na.rm = T),
                                        p0.95_time  = ~quantile(., probs = 0.95, na.rm = T)))

}

What is wrong with my code? I.e. I specifically struggle with the !!, !!!, sym, enquo, enquos things. Why does the group_by_at thing doesn't need the !! thing, whereas my summarize and mutate do need it?

You should only ask one question at a time. Focus on one clear question with a clear [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output. Broad questions asking for "hints" or "best ways" to do things without defining what "best" means exactly are off topic here. — MrFlick, Nov 13 '19 at 19:21
I editted my questions so that they are more focused now. I find it a bit confusing not being "allowed" to ask about "hints" and "best ways". In my current stage at learning R, these (different) opinions would have quite some value to me, especially in cases where there is no right or wrong answer. — deschen, Nov 13 '19 at 19:54
@G.Grothendieck I hope this gets closer to what you'd expect. — deschen, Nov 13 '19 at 20:26

G. Grothendieck · Answer 1 · 2019-11-13T23:24:53.690

Make these changes:

use sym and syms rather than enquo and enquos
use !! and !!! respectively.
createpo as a list and then use unnest_wider to expand into columns
quantile is already vectorized so we don't need map
the mutate can be incorporated right into the quantile call eliminating it
consolidate the pipelines into a single pipeline
use TRUE rather than T since the latter can be masked by a variable of that name whereas no variable may be called TRUE.
we can use plain group_by and summarize
there is no group3 in the sample data so we used group2 instead
this does not make sense without time_var so remove the default of NULL

This gives the following code

test_function <- function(data, time_var, group_vars = NULL) {
  p <- c(1:9/10, 0.95)
  time_var <- sym(time_var)
  group_vars <- syms(group_vars)
  data %>%
    group_by(!!!group_vars) %>%
    summarize(po = list(quantile(!!time_var / 60, p, na.rm = TRUE))) %>%
    ungroup %>%
    unnest_wider(po)
}

test_function(data = dat, time_var = "time", group_vars = c("group1", "group2"))

giving:

# A tibble: 4 x 12
  group1 group2   `10%`   `20%`   `30%`   `40%`   `50%`   `60%`   `70%`   `80%`   `90%`   `95%`
   <int>  <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1      1      1 0.00237 0.00432 0.00654 0.00903 0.0115  0.0120  0.0124  0.0133  0.0147  0.0154 
2      1      2 0.00244 0.00251 0.00281 0.00335 0.00388 0.00410 0.00432 0.00493 0.00591 0.00640
3      2      1 0.00371 0.00381 0.00468 0.00632 0.00796 0.0101  0.0122  0.0136  0.0143  0.0147 
4      2      2 0.00385 0.00538 0.00630 0.00660 0.00691 0.00725 0.00759 0.00907 0.0117  0.0130

Nice. This code looks great. I now also see at least one of my mistakes. When providing the group variables as strings I need to use `sym` as opposed to `enquo` when providing them as a variable list. I also read the tidyeval documentation again yesterday night which now (together with your code) shed a bit more light on the whole topic. — deschen, Nov 14 '19 at 09:54

R Quasiquotation & tidyeval for dynamic variable references in R in own functions

1 Answers1