How can I pass a column name as a function argument using dplyr and ggplot2?

Question

I am trying to write a function that will spit out model diagnostic plots.

to_plot <- function(df, model, response_variable, indep_variable) {
  resp_plot <- 
    df %>%
    mutate(model_resp = predict.glm(model, df, type = 'response')) %>%
    group_by(indep_variable) %>%
    summarize(actual_response = mean(response_variable),
              predicted_response = mean(model_resp)) %>%
    ggplot(aes(indep_variable)) + 
    geom_line(aes(x = indep_variable, y = actual_response, colour = "actual")) + 
    geom_line(aes(x = indep_variable, y = predicted_response, colour = "predicted")) +
    ylab(label = 'Response')

}

When I run this over a dataset, dplyr throws an error that I don't understand:

fit <- glm(data = mtcars, mpg ~ wt + qsec + am, family = gaussian(link = 'identity')
to_plot(mtcars, fit, mpg, wt)

 Error in grouped_df_impl(data, unname(vars), drop) : 
  Column `indep_variable` is unknown

Based on some crude debugging, I found that the error happens in the group_by step, so it could be related to how I'm calling the columns in the function. Thanks!

you need another layer of complexity to deal with *standard evaluation* (i.e., use the value that `indep_variable` stands for, rather than searching for `indep_variable` itself): https://stackoverflow.com/questions/44593596/how-to-pass-strings-denoting-expressions-to-dplyr-0-7-verbs/44593617#44593617 — Ben Bolker, Jul 26 '17 at 17:11
It's because dplyr works with non-standard evaluation. Hadley explains NSE here: http://dplyr.tidyverse.org/articles/programming.html and a pretty nice webinar here: https://www.rstudio.com/resources/webinars/whats-new-in-dplyr-0-7-0/ — biomiha, Jul 26 '17 at 17:12
Thanks. Based on your responses I've added a proposed answer below, but would appreciate feedback on ways to make it more clear. — joe, Jul 27 '17 at 04:43

joe · Accepted Answer · 2017-07-28T02:40:17.383

This code seems to fix it. As the commenters above mention, variables passed in to the function must be wrapped in the "enquo" function and then unwrapped with the !!. Note the aes() function becomes aes_() when working with strings.

library(tidyverse)

to_plot <- function(df, model, response_variable, indep_variable) {
  response_variable <- enquo(response_variable)
  indep_variable <- enquo(indep_variable)

  resp_plot <- 
    df %>%
    mutate(model_resp = predict.glm(model, df, type = 'response')) %>%
    group_by(!!indep_variable) %>%
    summarize(actual_response = mean(!!response_variable),
              predicted_response = mean(model_resp)) %>%
    ggplot(aes_(indep_variable)) + 
    geom_line(aes_(x = indep_variable, y = quote(actual_response)), colour = "blue") + 
    geom_line(aes_(x = indep_variable, y = quote(predicted_response)), colour = "red") +
    ylab(label = 'Response')

  return(resp_plot)
}

fit <- glm(data = mtcars, mpg ~ wt + qsec + am, family = gaussian(link = 'identity'))
to_plot(mtcars, fit, mpg, wt)

This works but isn't very graceful. Please feel free to edit with improvements, I don't think I have my head fully wrapped around this. — joe, Jul 28 '17 at 02:40

How can I pass a column name as a function argument using dplyr and ggplot2?

1 Answers1