1

I'm trying to get my head around using quasiquotation from the tidyverse in R in my own functions. I've read this one here: Passing a list of arguments to a function with quasiquotation and the whole thing here: https://tidyeval.tidyverse.org/

But I still don't get it to work.

Assume I have the following data:

dat <- data.frame(time   = runif(20),
                  group1 = rep(1:2, times = 10),
                  group2 = rep(1:2, each = 10),
                  group3 = rep(3:4, each = 10))

What I want to do now is to write a function that does the following:

  • take a data set
  • specify the variable that contains the time (note, in another data set this might be called "hours" or "qtime" or whatever)
  • specify by which groups I want to do operations/statistics on

So what I want the user to do is to use a function like:

test_function(data = dat, time_var = "time", group_vars = c("group1", "group3")) Note, I might choose different grouping variables or none next time.

Let's say within the function I want to:

  • calculate certain statistics on the time variable, e.g. the quantiles. Note: I want to split this up by my grouping variables

Here's one of my latest tries:

test_function <- function(data, time_var = NULL, group_vars = NULL)
{
# Note I initialize the variables with NULL, since e.g. the user might not specify a grouping 

and I want to check for that in my function at some point)
time_var <- enquo(time_var)
group_vars <- enquos(group_vars)

# Here I try to group by my grouping variables
temp_data <- data %>%
    group_by_at(group_vars) %>%
    mutate(!!sym(time_var) := !!sym(time_var) / 60)

# Here I'm calculating some stats  
time_stats <- temp_data %>%
    summarize_at(vars(!!time_var), list(p0.1_time   = ~quantile(., probs = 0.1, na.rm = T),
                                        p0.2_time   = ~quantile(., probs = 0.2, na.rm = T),
                                        p0.3_time   = ~quantile(., probs = 0.3, na.rm = T),
                                        p0.4_time   = ~quantile(., probs = 0.4, na.rm = T),
                                        p0.5_time   = ~quantile(., probs = 0.5, na.rm = T),
                                        p0.6_time   = ~quantile(., probs = 0.6, na.rm = T),
                                        p0.7_time   = ~quantile(., probs = 0.7, na.rm = T),
                                        p0.8_time   = ~quantile(., probs = 0.8, na.rm = T),
                                        p0.9_time   = ~quantile(., probs = 0.9, na.rm = T),
                                        p0.95_time  = ~quantile(., probs = 0.95, na.rm = T)))

}

What is wrong with my code? I.e. I specifically struggle with the !!, !!!, sym, enquo, enquos things. Why does the group_by_at thing doesn't need the !! thing, whereas my summarize and mutate do need it?

MrFlick
  • 195,160
  • 17
  • 277
  • 295
deschen
  • 10,012
  • 3
  • 27
  • 50
  • 3
    You should only ask one question at a time. Focus on one clear question with a clear [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output. Broad questions asking for "hints" or "best ways" to do things without defining what "best" means exactly are off topic here. – MrFlick Nov 13 '19 at 19:21
  • I editted my questions so that they are more focused now. I find it a bit confusing not being "allowed" to ask about "hints" and "best ways". In my current stage at learning R, these (different) opinions would have quite some value to me, especially in cases where there is no right or wrong answer. – deschen Nov 13 '19 at 19:54
  • @G.Grothendieck I hope this gets closer to what you'd expect. – deschen Nov 13 '19 at 20:26

1 Answers1

7

Make these changes:

  • use sym and syms rather than enquo and enquos
  • use !! and !!! respectively.
  • createpo as a list and then use unnest_wider to expand into columns
  • quantile is already vectorized so we don't need map
  • the mutate can be incorporated right into the quantile call eliminating it
  • consolidate the pipelines into a single pipeline
  • use TRUE rather than T since the latter can be masked by a variable of that name whereas no variable may be called TRUE.
  • we can use plain group_by and summarize
  • there is no group3 in the sample data so we used group2 instead
  • this does not make sense without time_var so remove the default of NULL

This gives the following code

test_function <- function(data, time_var, group_vars = NULL) {
  p <- c(1:9/10, 0.95)
  time_var <- sym(time_var)
  group_vars <- syms(group_vars)
  data %>%
    group_by(!!!group_vars) %>%
    summarize(po = list(quantile(!!time_var / 60, p, na.rm = TRUE))) %>%
    ungroup %>%
    unnest_wider(po)
}

test_function(data = dat, time_var = "time", group_vars = c("group1", "group2")) 

giving:

# A tibble: 4 x 12
  group1 group2   `10%`   `20%`   `30%`   `40%`   `50%`   `60%`   `70%`   `80%`   `90%`   `95%`
   <int>  <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1      1      1 0.00237 0.00432 0.00654 0.00903 0.0115  0.0120  0.0124  0.0133  0.0147  0.0154 
2      1      2 0.00244 0.00251 0.00281 0.00335 0.00388 0.00410 0.00432 0.00493 0.00591 0.00640
3      2      1 0.00371 0.00381 0.00468 0.00632 0.00796 0.0101  0.0122  0.0136  0.0143  0.0147 
4      2      2 0.00385 0.00538 0.00630 0.00660 0.00691 0.00725 0.00759 0.00907 0.0117  0.0130 
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Nice. This code looks great. I now also see at least one of my mistakes. When providing the group variables as strings I need to use `sym` as opposed to `enquo` when providing them as a variable list. I also read the tidyeval documentation again yesterday night which now (together with your code) shed a bit more light on the whole topic. – deschen Nov 14 '19 at 09:54