I have this great little function summarise_posterior
(given below) as part of my package driver
(available here).
The function is great and super useful. The one problem is that I have been working on larger and larger data and it can be very slow. In short, my question is: Is there a tidyverse-esque way of speeding this up while still retaining the key flexibility of this function (see examples in documentation).
At least one key speed up could come from figuring out how to put the calculation of the quantiles in a single call rather than calling the quantile function over and over. The latter option which is currently implemented is probably re-sorting same vectors over and over.
#' Shortcut for summarize variable with quantiles and mean
#'
#' @param data tidy data frame
#' @param var variable name (unquoted) to be summarised
#' @param ... other expressions to pass to summarise
#'
#' @return data.frame
#' @export
#' @details Notation: \code{pX} refers to the \code{X}\% quantile
#' @import dplyr
#' @importFrom stats quantile
#' @importFrom rlang quos quo UQ
#' @examples
#' d <- data.frame("a"=sample(1:10, 50, TRUE),
#' "b"=rnorm(50))
#'
#' # Summarize posterior for b over grouping of a and also calcuate
#' # minmum of b (in addition to normal statistics returned)
#' d <- dplyr::group_by(d, a)
#' summarise_posterior(d, b, mean.b = mean(b), min=min(b))
summarise_posterior <- function(data, var, ...){
qvar <- enquo(var)
qs <- quos(...)
data %>%
summarise(p2.5 = quantile(!!qvar, prob=0.025),
p25 = quantile(!!qvar, prob=0.25),
p50 = quantile(!!qvar, prob=0.5),
mean = mean(!!qvar),
p75 = quantile(!!qvar, prob=0.75),
p97.5 = quantile(!!qvar, prob=0.975),
!!!qs)
}
Rcpp back-end options are also more than welcome.
Thanks!