3

I am trying to use dplyr within a function to create a user-defined function that I can pass multiple arguments to summarise data with dplyr then plot it with ggplot.

Here is some sample data and what I am trying to do with dplyr then plot

df <-data.frame(Year = c("2006", "2006", "2006", "2007", "2007", "2007", "2008", "2009", "2010", "2010", "2009", "2009"), JudicialOrientation = c("Defense", "Plaintiff", "Plaintiff", "Neutral", "Defense", "Plaintiff", "Defense", "Plaintiff", "Neutral", "Neutral", "Plaintiff","Defense"), Loss = c(100000, 100, 2500, 100000, 25000, 0, 7500, 5200, 900, 100, 0, 50))

df1 <- df %>%
  group_by(Year, JudicialOrientation) %>%
  summarise(MeanLoss =mean(Loss))

ggplot(df1, aes(x = JudicialOrientation, y = MeanLoss, color = Year, group  =Year)) + 
  geom_line() +
  geom_point()

I am now trying to replicate this into a user function so that I can pass different variables to get similar results.

Here is my attempt so far:

ConsistencyPlot <- function(df,var1,timevar,lossvar){

  df1 <- df %>%
    group_by_(df[timevar], df[var1]) %>%
    summarise_(MeanLoss = mean(df[lossvar]))

  ggplot(df1, aes(x = var1, y = MeanLoss, color = timevar, group = timevar)) +
    geom_line() +
    geom_point()

}

ConsistencyPlot(df,"JudicialOrientation","Year",'Loss')

I am replicating the same logic and passing in df as my dataframe, var1 as the JudicialOrientation, timevar as Year and lossvar as my vector of Loss values that I want averaged through summarise. I cannot get the same results however so I feel like I am missing something with how these functions are used within a closure.

tjebo
  • 21,977
  • 7
  • 58
  • 94
Coldchain9
  • 1,373
  • 11
  • 31

1 Answers1

6

First of all, inside dplyr functions you don't need to call variables indexing the dataframe like df[, timevar]. Use just the variable name. Besides that, when indexing a dataframe you have to specify if you are calling columns or rows, so df[timevar] is wrong.

About the function, it's a problem of evaluation.

This structure below is working:

ConsistencyPlot <- function(df, var1, timevar, lossvar){
  var1 <- enquo(var1)
  timevar <- enquo(timevar)
  lossvar <- enquo(lossvar)

  df1 <- df %>%
    group_by(!!timevar, !!var1) %>%
    summarise(MeanLoss = mean(!!lossvar))

  ggplot(df1, aes(x = !!var1, y = MeanLoss, color = !!timevar, group = !!timevar)) +
    geom_line() +
    geom_point()
}

Look that the parameters were transformed with enquo() and then passed in the function using !!. So, you can pass the arguments without quoting them.

ConsistencyPlot(df, JudicialOrientation, Year, Loss)

I hope you find it useful.

Bruno Pinheiro
  • 964
  • 8
  • 20
  • I realized I was referencing column names wrong right after I posted this question. Can you explain to me what !! is doing? This is exactly what I wanted. Thank you very much. – Coldchain9 Nov 21 '18 at 15:19
  • It is an unquoting operator. See `?"!!"`. – Anonymous coward Nov 21 '18 at 15:20
  • 1
    It's exactly what @Anonymouscoward said. For a deepier exaplanation, take a look [here](https://adv-r.hadley.nz/evaluation.html#quoting-and-unquoting). Happy to help. – Bruno Pinheiro Nov 21 '18 at 15:29
  • I guess I am just trying to figure out why does the quoting-unquoting methodology work vs just sending the arguments in their unquoted form. It can't recognize the variables if I send them in so why does it work when they are enquoted then immediately unquoted with !!. I read ?enquo and if I understand correctly, the quosure maintains the original environment but !! just removes the quotes for evaluation purposes? – Coldchain9 Nov 21 '18 at 16:12
  • 2
    @Coldchain9: see this for further explanation https://stackoverflow.com/questions/51738267/non-standard-evaluation-and-quasiquotation-in-dplyr-not-working-as-naively-e/51738431#comment91545917_51738267 – Tung Nov 21 '18 at 16:36