Understanding R non-standard evaluation with tidyverse (ggplot2)

Question

The following code yields a plot:

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, y = after_stat(prop), group = 1))

The identifier prop is not bound to anything, which is possible in R. It's name is used later and I think it's just more convenient than using quoted strings.

prop defines to compute a statistic (proportion), so y = ... does not refer to a column name of the date but to the statistic.

after_stat delays the evaluation of prop, otherwise ggplot would search for a column named prop.

Not surprisingly, ?prop does not help:

> ?prop
No documentation for ‘prop’ in specified packages and libraries:
you could try ‘??prop’

So far so good.

When I want to investigate what after_stat actually does, I get:

> after_stat
function (x) 
{
    x
}
<bytecode: 0x9qe2fcx7815>
<environment: namespace:ggplot2>

My guess is: Somewhere, an evaluation of after_stat(prop) notices that there is a function/thing with a name after_stat (instead of an unspecified identifier).

Is this a trick? The "identity" function seems to be of no use otherwise.

More to review if you look through the source for `ggplot2`: https://github.com/tidyverse/ggplot2/blob/87e9b85dd9f2a294f339d88a353d0c11c851489d/R/aes-evaluation.r. You'll see other functions like `is_calculated` which check if `after_stat` was called for that argument. — Jon Spring, Aug 31 '21 at 08:15
see the "computed variables" section of the geom you want, `?geom_bar` in your case, these are variables that are computed by ggplot but are accessible to the user, `?aes_eval` has an explanation. these took the form of `..variable..` in earlier versions of ggplot but still work (i guess ggplot finally learned it is bad to break user space) — rawr, Aug 31 '21 at 08:16
`stat_prop` is a variation of `ggplot2::stat_count()` allowing to compute custom proportions according to the by aesthetic defining the denominator : http://ggobi.github.io/ggally/articles/ggally_stats.html#stat-prop-, http://ggobi.github.io/ggally/reference/stat_prop.html — Park, Aug 31 '21 at 08:20
@Park yes, I know, I was more concerned with how it works internally. — Xiphias, Aug 31 '21 at 08:41
Just nitpicking your title: this is really `ggplot2` behaviour, probably using other `tidyverse` packages built on R capabilities. So it would make more sense to use "Understanding `ggplot2` non-standard evaluation" to summarize the question. — user2554330, Aug 31 '21 at 10:06

teunbrand · Answer 1 · 2021-08-31T15:10:52.903

Many stat layers (notable exception is stat_identity()) have 'computed variables'. You'll see these computed variables documented in the 'Computed variables' section of, for example ?stat_density. These variables become columns in the working data for a layer, which you can inspect by calling layer_data(last_plot()) after printing a plot.

The after_stat() function, as you pointed out, does nothing to the data. However, because the expressions are captured before being evaluated, ggplot2 can parse the language in the expression to see if the expression contains a call to after_stat() (and a few others). The logic of this is in the source code that Jon Spring pointed out.

If you look at the compute_aesthetics methods of a layer, you can see that the evaluation of these modified expressions is exempt from evaluation.

> geom_point()$compute_aesthetics
<ggproto method>
  <Wrapper function>
    function (...) 
f(..., self = self)

  <Inner function (f)>
    function (self, data, plot) 
{
 # ... omitted for brevity
 calculated <- is_calculated_aes(aesthetics)
 modifiers <- is_scaled_aes(aesthetics)
 aesthetics <- aesthetics[!set & !calculated & !modifiers]
 # ... 
 evaled <- lapply(aesthetics, eval_tidy, data = data, env = env)
 # ...
}

For the after_stat() function, these aesthetics are evaluated after the statistic has been computed in e.g. the geom_point()$map_statistic method. I'm not 100% what is going on here, but it looks like a special environment is build for the evaluation of the modified expressions. I think particular data masks that ensure that the evaluation takes place in the context of the working data and thus have access to the computed variables (rather than global or user-specified layer data which do not).

> geom_point()$map_statistic
<ggproto method>
  <Wrapper function>
    function (...) 
f(..., self = self)

  <Inner function (f)>
    function (self, data, plot) 
{
 # ... omitted for brevity
 new <- strip_dots(aesthetics[is_calculated_aes(aesthetics) | 
        is_staged_aes(aesthetics)])
 if (length(new) == 0) 
        return(data)
 env <- child_env(baseenv(), stat = stat, after_stat = after_stat)
 stage_mask <- child_env(emptyenv(), stage = stage_calculated)
 mask <- new_data_mask(as_environment(data, stage_mask), stage_mask)
 mask$.data <- as_data_pronoun(mask)
 new <- substitute_aes(new)
 stat_data <- lapply(new, eval_tidy, mask, env)
 # ...
}

Understanding R non-standard evaluation with tidyverse (ggplot2)

1 Answers1

Linked