7

I use mostly ggplot2 for visualizations. Typically, I design the plot interactively (i.e. raw ggplot2 code that uses NSE) but in the end, I frequently end up wrapping up that code into a function that receives the data and variables to plot. And this is always a little bit of a nightmare.

So, the typical situations looks like this. I have some data and I create a plot for it (in this case, a very very simple example, using the mpg dataset that comes with ggplot2).

library(ggplot2)
data(mpg)

ggplot(data = mpg, 
       mapping = aes(x = class, y = hwy)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")


And when I finish designing the plot, I typically want to use it for different variables or data, etc. So I create a function that receives the data and variables for the plot as arguments. But due to NSE, it is not as easy as to write function header and then copy/paste and replace variables for function arguments. That would not work, as shown below.

mpg <- mpg
plotfn <- function(data, xvar, yvar){
    ggplot(data = data, 
           mapping = aes(x = xvar, y = yvar)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")
}
plotfn(mpg, class, hwy) # Can't find object

## Don't know how to automatically pick scale for object of type function. Defaulting to continuous.

## Warning: restarting interrupted promise evaluation

## Error in eval(expr, envir, enclos): object 'hwy' not found

plotfn(mpg, "class", "hwy") # 


So I have to go back and fix the code, for example, using aes_string intead of the aes that uses NSE (in this example it is rather easy, but for more complicated plots, with lots of transformations and layers, this becomes a nightmare).

plotfn <- function(data, xvar, yvar){
    ggplot(data = data, 
           mapping = aes_string(x = xvar, y = yvar)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")
}
plotfn(mpg, "class", "hwy") # Now this works


And the thing is that I find very convenient NSE and also lazyeval. So I like to do something like this.

mpg <- mpg
plotfn <- function(data, xvar, yvar){
    data_gd <- data.frame(
        xvar = lazyeval::lazy_eval(substitute(xvar), data = data),
        yvar = lazyeval::lazy_eval(substitute(yvar), data = data))

    ggplot(data = data_gd, 
           mapping = aes(x = xvar, y = yvar)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")
}
plotfn(mpg, class, hwy) # Now this works

plotfn(mpg, "class", "hwy") # This still works

plotfn(NULL, rep(letters[1:4], 250), 1:100) # And even this crazyness works


This gives my plot function a lot of flexibility. For example, you can pass quoted or unquoted variable names and even the data directly instead of a variable name (kind of abusing of lazy evaluation).

But this has a huge problem. The function cannot be used programmatically.

dynamically_changing_xvar <- "class"
plotfn(mpg, dynamically_changing_xvar, hwy) 

## Error in eval(expr, envir, enclos): object 'dynamically_changing_xvar' not found

# This does not work, because it never finds the object 
# dynamically_changing_xvar in the data, and it does not get evaluated to 
# obtain the variable name (class)

So I cannot use loops (e.g. lapply) to produce the same plot for different combinations of variables, or data.

So I thought to abuse even more of lazy, standard and non-standard evaluation, and try to combine them all so I have both, the flexibility shown above and the ability to use the function programmatically. Basically, what I do is to use tryCatch to first lazy_eval the expression for each variable and if it fails, to evaluate the parsed expression.

plotfn <- function(data, xvar, yvar){
    data_gd <- NULL
    data_gd$xvar <- tryCatch(
        expr = lazyeval::lazy_eval(substitute(xvar), data = data),
        error = function(e) eval(envir = data, expr = parse(text=xvar))
    )
    data_gd$yvar <- tryCatch(
        expr = lazyeval::lazy_eval(substitute(yvar), data = data),
        error = function(e) eval(envir = data, expr = parse(text=yvar))
    )


    ggplot(data = as.data.frame(data_gd), 
           mapping = aes(x = xvar, y = yvar)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")
}

plotfn(mpg, class, hwy) # Now this works, again

plotfn(mpg, "class", "hwy") # This still works, again

plotfn(NULL, rep(letters[1:4], 250), 1:100) # And this crazyness still works

# And now, I can also pass a local variable to the function, that contains
# the name of the variable that I want to plot
dynamically_changing_xvar <- "class"
plotfn(mpg, dynamically_changing_xvar, hwy) 


So, in addition to the aforementioned flexibility, now I can use one-liner or so, to produce many of the same plot, with different variables (or data).

lapply(c("class", "fl", "drv"), FUN = plotfn, yvar = hwy, data = mpg)

## [[1]]

## 
## [[2]]

## 
## [[3]]


Even though it is very practical, I suspect this is not good practice. But how bad practice it is? That's my key question. What other alternatives can I use to have the best of both worlds?

Of course, I can see this pattern can create problems. For example.

# If I have a variable in the global environment that contains the variable
# I want to plot, but whose name is in the data passed to the function, 
# then it will use the name of the variable and not its content
drv <- "class"
plotfn(mpg, drv, hwy) # Here xvar on the plot is drv and not class


And some (many?) other problems. But it seems to me that the benefits in terms of syntax-flexibility outweigh those other issues. Any thoughts on this?

elikesprogramming
  • 2,506
  • 2
  • 19
  • 37
  • 1
    The best practice is to produce a pair of functions. One is NSE, the other SE. This is outlined in `vignette('nse')`. This does mean using `aes_` instead of `aes`. – Axeman Apr 18 '16 at 08:59
  • Thanks, ..., yeah, I was afraid that was going to be the answer. Although I see the benefits of dplyr & co. "consistent naming scheme: the SE is the NSE name with _ on the end", it always bugs me having to use a different function for programming and working interactively. – elikesprogramming Apr 19 '16 at 09:05

1 Answers1

2

Extracting your proposed function for clarity:

library(ggplot2)
data(mpg)

plotfn <- function(data, xvar, yvar){
  data_gd <- NULL
  data_gd$xvar <- tryCatch(
    expr = lazyeval::lazy_eval(substitute(xvar), data = data),
    error = function(e) eval(envir = data, expr = parse(text=xvar))
  )
  data_gd$yvar <- tryCatch(
    expr = lazyeval::lazy_eval(substitute(yvar), data = data),
    error = function(e) eval(envir = data, expr = parse(text=yvar))
  )

  ggplot(data = as.data.frame(data_gd), 
         mapping = aes(x = xvar, y = yvar)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")
}

Such a function is generally quite useful, since you can freely mix strings, and bare variable names. But as you say, it may not always be safe. Consider the following contrived example:

class <- "drv"
Class <- "drv"
plotfn(mpg, class, hwy) 
plotfn(mpg, Class, hwy) 

What will your function generate? Will these be the same (they are not)? It's not really clear to me what will be the result. Programming with such a function may give unexpected results, depending which variables exist in data and which exist in the environment. Since a lot of people use variable names like x, xvar or count (even though they perhaps shouldn't), things can get messy.

Also, if I wanted to force one or the other interpretation of class, I can't.

I'd say it's kind of similar to using attach: convenient, but at some point it might bite you in your behind.

Therefore, I'd use an NSE and SE pair:

plotfn <- function(data, xvar, yvar) {
  plotfn_(data,
          lazyeval::lazy_eval(xvar, data = data),
          lazyeval::lazy_eval(yvar, data = data))
  )
}

plotfn_ <- function(data, xvar, yvar){
  ggplot(data = data, 
         mapping = aes_(x = xvar, y = yvar)) +
    geom_boxplot() + 
    geom_jitter(alpha = 0.1, color = "blue")
}

Creating these is actually easier than your function, I think. You could opt to capture all arguments lazily with lazy_dots too.

Now we get more easy to predict results when using the safe SE version:

class <- "drv"
Class <- "drv"
plotfn_(mpg, class, 'hwy')
plotfn_(mpg, Class, 'hwy')

The NSE version is still affected though:

plotfn(mpg, class, hwy)
plotfn(mpg, Class, hwy)

(I find it mildly annoying that ggplot2::aes_ doesn't also take strings.)

Axeman
  • 32,068
  • 8
  • 81
  • 94
  • 1
    yeah, I agree 100% with "Programming with such a function may give unexpected results, depending which variables exist in data and which exist in the environment.", ..., just that sometimes I kind of feel that the convenience of it outweighs the risk of getting bitten in my behind. – elikesprogramming Apr 30 '16 at 21:25
  • The last two lines of code did not work when I tried running them. – student Oct 11 '17 at 17:50
  • NSE in the tidyverse has changed, so that's very possible. – Axeman Oct 11 '17 at 17:56