0

I have a data frame and I want to tell ggplot2 to color according to a column I name with a string variable

cubbins <- tibble(x=1:10, y=c(1:5, 1:5), title=rep(c("500 hats", "Oobleck"), 5))
my_col <- "title"

cubbins %>% ggplot(aes_string(x="x", y="y", color=my_col)) + geom_point() ## works, deprecated
cubbins %>% ggplot(aes(x=x, y=y, color=!!my_col)) + geom_point() ## wrong
cubbins %>% ggplot(aes(x=x, y=y, color={{my_col}})) + geom_point() ## same wrong
cubbins %>% ggplot(aes(x=x, y=y, color=sym(my_col))) + geom_point() ## invalid aesthetics warning
cubbins %>% ggplot(aes(x=x, y=y, color=enquo(my_col))) + geom_point() ## invalid aesthetics warning

cubbins %>% ggplot(aes(x=x, y=y, color=.data[[my_col]])) + geom_point() ## this is what's recommended in the current aes_string documentation
cubbins %>% ggplot(aes(x=x, y=y, color=!!sym(my_col))) + geom_point() ## WORKS, please explain

The last version works, but I'm confused about why all the other versions aren't working. Your assistance is appreciated.

wrong version: plot using embrace incorrectly correct version: correct plot

For background, aes_string is deprecated, but I didn't find any explanation for the proper workaround anywhere in my brief googling.

General refernce: https://rlang.r-lib.org/reference/topic-metaprogramming.html

zx8754
  • 52,746
  • 12
  • 114
  • 209
flies
  • 2,017
  • 2
  • 24
  • 37
  • https://stackoverflow.com/questions/74414272/how-to-replace-the-deprecated-ggplot2-function-aes-string-accepting-an-arbitrar#string-accepting-an-arbitrar This question is similar, but more complex. It suggests using `sym` in one of the answers, but it's not easy (for me) to see immediately that `!!sym(x)` solves the problem. – flies Jul 13 '23 at 21:57
  • more related info https://cmdlinetips.com/2018/07/ggplot2-version-3-0-0-brings-tidy-evaluation-to-ggplot/ – flies Jul 13 '23 at 22:00
  • SO isn't really the right forum for this question, but this is explained in detail in Advanced R ([free online](https://adv-r.hadley.nz/index.html) or [buy it](https://www.amazon.com/Advanced-Second-Chapman-Hall-CRC-dp-0815384572/dp/0815384572/ref=dp_ob_title_bk)), specifically in the [quasiquotation](https://adv-r.hadley.nz/quasiquotation.html) and [evaluation](https://adv-r.hadley.nz/evaluation.html) chapters. Basically, your failed attempts all use non-standard evaluation (NSE); when you're using NSE, things get complicated, but `!!sym(var)` is an appropriate expression in this case. – jared_mamrot Jul 13 '23 at 23:18

1 Answers1

3

We can actually use the aes function on its own to get a better handle on what's going on.

Essentially, we want something that gives us the same output as:

library(ggplot2)

aes(colour = title)
#> Aesthetic mapping: 
#> * `colour` -> `title`

Here, the aesthetics colour is linked to the symbol title. But we want to achieve this using a stored, pre-defined character string:

my_col <- "title"

Previously, we could simply have used aes_string:

aes_string(color = my_col)
#> Aesthetic mapping: 
#> * `colour` -> `title`
#> Warning message:
#> `aes_string()` was deprecated in ggplot2 3.0.0.
#> i Please use tidy evaluation ideoms with `aes()`

This works, mapping colour to the symbol title, but as the warning tells us, this function is deprecated.

For completeness, let's look at aes(color = my_col), which you sensibly didn't even try

aes(color = my_col)
#> Aesthetic mapping: 
#> * `colour` -> `my_col`

Of course, the aes function has simply captured the symbol my_col, which won't work because you have no column called my_col in your data frame.

Now we move on to the attempts using unquoting. Both the !! and {{ operators evaluate their operand and inline it in the AST, so they both evaluate to the string "title", not the symbol title. Note the quotation marks:

aes(color = !!my_col)
#> Aesthetic mapping: 
#> * `colour` -> "title"

#> aes(color = {{my_col}})
#> Aesthetic mapping: 
#> * `colour` -> "title"

This means that all the points on your plot will be given the same colour, since they are all mapped to a single string, which will just be recycled to the length of the data.

You might think that sym or enquo should work, but the problem is that without an unquoting operator, the whole unevaluated expression is captured. Look what happens with sym and enquo:

aes(color = sym(my_col))
#> Aesthetic mapping: 
#> * `colour` -> `sym(my_col)`

aes(color = enquo(my_col))
#> Aesthetic mapping: 
#> * `colour` -> `enquo(my_col)`

Now we are trying to map colour to a call, which doesn't even make sense, and ggplot will just throw an error. We need our mapping to be to a symbol, not a call. However, we can evaluate and inline sym(my_col) using the !! operator to get the symbol title. If we use enquo then we will again get a string as the output of the evaluation, and the {{ operator only works on names, not calls, so the solution is:

aes(color = !!sym(my_col))
#> Aesthetic mapping: 
#> * `colour` -> `title`

I actually prefer this to the recommended method, which results in:

aes(color = .data[my_col])
#> Aesthetic mapping: 
#> * `colour` -> `.data[my_col]`

This works perfectly well, but only because ggplot recognises the .data pronoun and treats it differently.

There are other options. Anything that gets us the correct symbol will work:

aes(color = !!str2lang(my_col))
#> Aesthetic mapping: 
#> * `colour` -> `title`

aes(color = !!as.name(my_col))
#> Aesthetic mapping: 
#> * `colour` -> `title`

And it's even possible to build the aesthetic mapping from scratch using only base R functions if you really wanted to:

ggplot(cubbins, 
  structure(
    list(colour = structure(
      `attr<-`(call("~", str2lang(my_col)), ".Environment", .GlobalEnv), 
      class = c("quosure", "formula")), 
    x = structure(
      `attr<-`(call("~", str2lang("x")), ".Environment", .GlobalEnv), 
      class = c("quosure", "formula")),
    y = structure(
      `attr<-`(call("~", str2lang("y")), ".Environment", .GlobalEnv), 
      class = c("quosure", "formula"))
    ), 
    class = "uneval")) + geom_point()

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87