0

I have a template that I use to aggregate up my data from its source to get means and 95% confidence levels, in order to plot these in ggplot (originally adapted from a Stack Overflow post many years ago, apologies but I don't know the original source) that looks like:

data %>%
  group_by(var1, var2) %>%
  summarise(count=n(),
            mean.outcome_variable = mean(outcome_variable, na.rm = TRUE),
            sd.outcome_variable = sd(outcome_variable, na.rm = TRUE),
            n.outcome_variable = n(),
            total.outcome_variable = sum(outcome_variable)) %>%
  mutate(se.outcome_variable = sd.outcome_variable / sqrt(n.outcome_variable),
         lower.ci.outcome_variable = mean.outcome_variable - qt(1 - (0.05 / 2), n.outcome_variable - 1) * se.outcome_variable,
         upper.ci.outcome_variable = mean.outcome_variable + qt(1 - (0.05 / 2), n.outcome_variable - 1) * se.outcome_variable)

This works well with one or two outcome variables but becomes infeasibly impractical to copy and paste with large numbers of outcome variables, so I was hoping to use summarise_if instead where I have large numbers of outcome variables which are all numeric. However I do not know how to specify anything more complex than a simple function such as "mean" or "sd" in the "funs" argument. I have tried gmodels::ci() as follows:

dataset_aggregated <- data %>%
  group_by(var1, var2) %>%
  summarise_if(is.numeric, funs(mean, lowCI = ci()[2], hiCI = ci()[3])) # does not work without brackets either

However this results in

Error in summarise_impl(.data, dots) : 
  Evaluation error: no applicable method for 'ci' applied to an object of class "NULL".

How do I get this to work?

Mel
  • 700
  • 6
  • 31
  • 1
    see the help files for the scoped variants of `summarise()`. It would look something like this: `summarise_if(is.numeric, list(mean = ~mean(.), lowCI = ~ci(.)[2], hiCI = ~ci(.)[3]))` – Andrew Dec 11 '19 at 14:44
  • 1
    Thanks! This will come in really handy! – Mel Dec 11 '19 at 14:54

1 Answers1

0

I worked out how to do this just as I got the question ready to post, but I thought I'd share in case anyone else was having the same issues as the answer is surprisingly simple and I can't believe it took me so long to think of it. Basically I just made custom lci() and uci() functions to separate out the results from gmodels::ci() and called these instead, e.g.

lci <- function(data) {
  as.numeric(ci(data)[2])
}

uci <- function(data) {
  as.numeric(ci(data)[3])
}

dataset_aggregated <- dataset %>%
  group_by(var1, var2) %>% #you can group by however many you want here, just put them in the select statement below
  summarise_if(is.numeric, funs(mean, lci, uci)) %>% 
  select(var1, var2, sort(current_vars())) #sorts columns into lci, mean, uci for each outcome variable alphabetically
Mel
  • 700
  • 6
  • 31