0

I need to compute several quantiles from a single numeric vector and use dplyr::summarise for that. Here's what I have :

library(dplyr)
library(rlang)

quantiles <- function(data, group, ...){
  group <- enquo(group)
  value_vars <- quos(...)
  data %>%
    group_by(!!group) %>%
    summarise_at(vars(!!!value_vars), funs(
      median = median,
      q1 = quantile(., probs = 0.25),
      q3 = quantile(., probs = 0.75))
    ) %>%
    ungroup()
}
quantiles(data = iris, group = Species, Sepal.Length, Petal.Width)

It works but triggers the note of no visible binding for variable '.' when checking the package. So I'm looking for a way to get rid of . in the function. I can substitute a mutate_at to summarise_at then summarise with first, but it can get quite heavy :

quantiles <- function(data, group, ...){
  group <- enquo(group)
  value_vars <- quos(...)
  data %>%
    group_by(!!group) %>%
    mutate_at(vars(!!!value_vars), funs(median = median)) %>%
    mutate_at(vars(!!!value_vars), funs(q1 = quantile), probs = 0.25) %>%
    mutate_at(vars(!!!value_vars), funs(q3 = quantile), probs = 0.75) %>%
    summarise_at(vars(matches('(median|q1|q3)$')), first) %>%
    ungroup()
}
quantiles(data = iris, group = Species, Sepal.Length, Petal.Width)

edit : use purrr:map2

I can build a list of functions with the desired secondary argument values :

quantile_funs <- purrr::map2(
  .x = list(median = median, q1 = quantile, q3 = quantile),
  .y = list(NULL, 0.25, 0.75),
  .f = function(fun, arg){
    function(x) fun(x, probs = arg)
  }
)

quantiles <- function(data, group, ...){
  group <- enquo(group)
  value_vars <- quos(...)
  data %>%
    group_by(!!group) %>%
    summarise_at(vars(!!!value_vars), .funs = quantile_funs) %>%
    ungroup()
}
quantiles(data = iris, group = Species, Sepal.Length, Petal.Width)

This works well, but due to luck since mean has an ... argument which allows me to actually do mean(x, probs = NULL) while it does not have any probs argument.

I tried the following but it did not work :

quantile_funs <- purrr::map2(
  .x = list(median = median, q1 = quantile, q3 = quantile),
  .y = list(list(NULL = NULL), list(probs = 0.25), list(probs = 0.75)),
  .f = function(fun, arg){
    function(x) fun(x, splice(arg))
  }
)
Romain
  • 1,931
  • 1
  • 13
  • 24
  • I suggest you to use the ```purrr``` package. There's a lot of map functions, and they work well with ```dplyr```. – igorkf Feb 28 '19 at 13:32
  • Thanks, I tried something mapping functions to arguments and returning a partial, it does work but is a little bit lucky.. – Romain Feb 28 '19 at 14:10
  • 1
    Not quite a dupe but I asked a similar [question](https://stackoverflow.com/q/53288100/5325862) a while back and got really good answers – camille Feb 28 '19 at 14:13
  • Thanks, I had a look at it and it's indeed pretty close, the main difference is that I'm using different mutate/summarise function instead of one isn't it ? I'll dive into the solutions that you were given, it is likely I find a solution there – Romain Feb 28 '19 at 14:25

1 Answers1

1

Here is one option using the mapply function:

library('data.table')
quantiles <- function(data, group, v.names, quantile = c(.25, 0.5, .75)){
  data <- as.data.table(data)
  gLevels <- levels(data[, get(group)])
  quantileDT <- as.data.table(
    expand.grid(v.name = v.names, grp = gLevels, quantile = quantile,
                stringsAsFactors = FALSE))
  quantileDT[, Value:= 
               mapply(function(v, g, q) quantile(data[get(group) == g, get(v)],  q),
                      v = v.name, 
                      g = grp, 
                      q = quantile)]

  dcast(quantileDT, grp ~ v.name + quantile, value.var = 'Value')
}

quantiles(data = iris, group = 'Species', v.names = c('Sepal.Length', 'Petal.Width'))

Could perhaps use some cleaning up-- e.g. using data and quantile as variable names is not such a great idea. Here is the output you get:

          grp Petal.Width_0.25 Petal.Width_0.5 Petal.Width_0.75 Sepal.Length_0.25 Sepal.Length_0.5 Sepal.Length_0.75
1:     setosa              0.2             0.2              0.3             4.800              5.0               5.2
2: versicolor              1.2             1.3              1.5             5.600              5.9               6.3
3:  virginica              1.8             2.0              2.3             6.225              6.5               6.9
Andrew Royal
  • 336
  • 1
  • 5
  • That works indeed thanks, but I got used to writing variable names without quotes due to intensive usage of the tidyverse so I'm not a big fan or passing them as strings :) – Romain Feb 28 '19 at 15:31