8

Doesn't work:

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)
xcol <- "Col 1"
ycol <- "Col 2"
ggplot(data=mydat, aes_string(x=xcol, y=ycol)) + geom_point()

Works:

mydat <- data.frame(`A`=1:5, `B`=1:5)
xcol <- "A"
ycol <- "B"
ggplot(data=mydat, aes_string(x=xcol, y=ycol)) + geom_point()

Works.

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)
ggplot(data=mydat, aes(x=`Col 1`, y=`Col 2`)) + geom_point()

What's the issue?

thc
  • 9,527
  • 1
  • 24
  • 39
  • 1
    The docs for `aes_string` show that 1. weirdly named columns don't always work well (see second to last set of examples), and 2. `aes_string` and `aes_` are being deprecated in favor of tidyeval – camille Aug 02 '18 at 16:49
  • @camille Thanks, do you have a link to explaining tidyeval? – thc Aug 02 '18 at 19:19
  • Sure, here's one: https://colinfay.me/tidyeval-1/ – camille Aug 02 '18 at 19:33
  • Also, it's interesting to see answers to [this post](https://stackoverflow.com/q/45439813/5325862), since a few are from before tidyeval was implemented in ggplot, and a few are from post-implementation – camille Aug 02 '18 at 19:35

3 Answers3

11

UPDATE: Note that in more recent version of ggplot2, the use of aes_string is discouraged. Instead if you need to get a column value from a string, use the .data pronoun

ggplot(data=mydat, aes(x=,.data[[xcol]], y=.data[[ycol]])) + geom_point()

ORIGINAL ANSWER: Values passed to aes_string are parse()-d. This is because you can pass things like aes_string(x="log(price)") where you aren't passing a column name but an expression. So it treats your string like an expression and when it goes to parse it, it finds the space and that's an invalid expression. You can "fix" this by wrapping column names in quotes. For example, this works

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)
xcol <- "Col 1"
ycol <- "Col 2"
ggplot(data=mydat, aes_string(x=shQuote(xcol), y=shQuote(ycol))) + geom_point()

We just use shQuote() to but double quotes around our values. You could have also embedded the single ticks like you did in the other example in your string

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)
xcol <- "`Col 1`"
ycol <- "`Col 2`"
ggplot(data=mydat, aes_string(x=xcol, y=ycol)) + geom_point()

But the real best way to deal with this is to not use column names that are not valid variable names.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thanks, time to re-familiarize myself with tidyverse =/ – thc Aug 02 '18 at 19:18
  • Well, this isn't the "tidyverse" way to do things any more. This is the legacy ggplot way. With modern tidyverse programming you would use quosures (or `expr()` or `sym()`) and expand those into `aes()`. Still doesn't really help with column names with spaces though. Those are just evil. – MrFlick Aug 02 '18 at 19:20
  • I disagree that names with spaces are "not valid variable names". For example, you can do this: `\`x 2\` <- 1` or use an explicit assign to the global environment without issue. – thc Aug 02 '18 at 21:39
  • @thc Ok. I guess I meant variable names that don’t require being surrounded by quotes. You can never use that name without also typing the quotes. And most people go to great lengths just to avoid a few extra characters (ala non-standard evaluation) – MrFlick Aug 02 '18 at 21:43
  • @thc I mean you can also do `\`4$.^\` <- 3`, but I would be reluctant to call that a valid variable name. The ticks really let you circumvent the normal variable name rules. – MrFlick Aug 02 '18 at 21:48
  • This is the accepted answer but it doesn't work today. I see only one x-axis value (Col 1) and one y-axis value (Col 2). Can you update the answer for current ggplot2 version 3.3.5? – Dario Oct 22 '21 at 05:45
  • @Dario I've updated the answer. Baiscally you shouldn't use `aes_string` any more. Instead use the `.data` pronoun. – MrFlick Oct 22 '21 at 17:43
4

Here's a tidyeval approach, which is what the tidyverse development crew is moving towards in place of aes_ or aes_string. Tidyeval is tricky at first, but pretty well documented.

This recipe sheet isn't ggplot-specific, but it's on my bookmarks toolbar because it's pretty handy.

In this case, you want to write a function to handle making your plot. This function takes a data frame and two bare column names as arguments. Then you turn the column names into quosures with enquo, then !! unquotes them for use in aes.

library(ggplot2)

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)

pts <- function(data, xcol, ycol) {
  x_var <- enquo(xcol)
  y_var <- enquo(ycol)
  ggplot(data, aes(x = !!x_var, y = !!y_var)) +
    geom_point()
}

pts(mydat, `Col 1`, `Col 2`)

But also like @MrFlick said, do whatever you can to just use valid column names, because why not?

camille
  • 16,432
  • 18
  • 38
  • 60
  • Thanks. It's mostly because `read_xl` preserves spaces, and it saves having to perform re-labeling of axes. – thc Aug 02 '18 at 21:41
  • 1
    But adding a `labs` line to your plot is probably easier than having to write a whole tidyeval wrapper function, no? – camille Aug 03 '18 at 00:37
  • It's still useful to be able to do this because you might be writing a plotting function that end-users of your R package will use and it's nice to automatically set axis labels on the plot which do not look like programmer variable names and saves your end-user from having to manually add nice `labs` themselves. – Dario Oct 22 '21 at 05:39
3

To whom it may still concern, if the column name happens to contain space or some math symbols like >, <, or =, one easy workaround is to wrap your string with as.name() when passing it to aes_string().

TQCH
  • 1,162
  • 1
  • 6
  • 13