36

I need to draw a scatterplot with addressing variables by their column numbers instead of names, i.e. instead of ggplot(dat, aes(x=Var1, y=Var2)) I need something like ggplot(dat, aes(x=dat[,1], y=dat[,2])). (I say 'something' because the latter doesn't work).

Here is my code:

showplot1<-function(indata, inx, iny){
  dat<-indata
  print(nrow(dat)); # this is just to show that object 'dat' is defined
  p <- ggplot(dat, aes(x=dat[,inx], y=dat[,iny]))
  p + geom_point(size=4, alpha = 0.5)
}

testdata<-data.frame(v1=rnorm(100), v2=rnorm(100), v3=rnorm(100), v4=rnorm(100), v5=rnorm(100))
showplot1(indata=testdata, inx=2, iny=3)
# Error in eval(expr, envir, enclos) : object 'dat' not found
zx8754
  • 52,746
  • 12
  • 114
  • 209
Vasily A
  • 8,256
  • 10
  • 42
  • 76

8 Answers8

26

Your problem is that aes doesn't know your function's environment and it only looks within global environment. So, the variable dat declared within the function is not visible to ggplot2's aes function unless you pass it explicitly as:

showplot1<-function(indata, inx, iny) {
    dat <- indata
    p <- ggplot(dat, aes(x=dat[,inx], y=dat[,iny]), environment = environment())
    p <- p + geom_point(size=4, alpha = 0.5)
    print(p)
}

Note the argument environment = environment() inside the ggplot() command. It should work now.

Arun
  • 116,683
  • 26
  • 284
  • 387
  • yes, I suspected that it was the problem of scope, now it's clear, and I like this solution even more. Thanks! – Vasily A Mar 10 '13 at 14:55
17

I strongly suggest using aes_q instead of passing vectors to aes (@Arun's answer). It may look a bit more complicated, but it is more flexible, when e.g. updating the data.

showplot1 <- function(indata, inx, iny){
  p <- ggplot(indata, 
              aes_q(x = as.name(names(indata)[inx]), 
                    y = as.name(names(indata)[iny])))
  p + geom_point(size=4, alpha = 0.5)
}

And here's the reason why it is preferable:

# test data (using non-standard names)
testdata<-data.frame(v1=rnorm(100), v2=rnorm(100), v3=rnorm(100), v4=rnorm(100), v5=rnorm(100))
names(testdata) <- c("a-b", "c-d", "e-f", "g-h", "i-j")
testdata2 <- data.frame(v1=rnorm(100), v2=rnorm(100), v3=rnorm(100), v4=rnorm(100), v5=rnorm(100))
names(testdata2) <- c("a-b", "c-d", "e-f", "g-h", "i-j")

# works
showplot1(indata=testdata, inx=2, iny=3)
# this update works in the aes_q version
showplot1(indata=testdata, inx=2, iny=3) %+% testdata2

Note: As of ggplot2 v2.0.0 aes_q() has been replaced with aes_() to be consistent with SE versions of NSE functions in other packages.

zx8754
  • 52,746
  • 12
  • 114
  • 209
shadow
  • 21,823
  • 4
  • 63
  • 77
  • indeed, your solution looks more flexible, I also like that it makes correct axes titles... Thanks! – Vasily A Jun 03 '15 at 19:56
  • 7
    As of ggplot2 v2.0.0: aes_q() has been replaced with aes_() to be consistent with SE versions of NSE functions in other packages https://github.com/hadley/ggplot2/blob/master/NEWS.md#deprecated-features – Tung Mar 01 '16 at 22:40
  • This answer worked well for me too, thank you. I must say, ggplot doesn't seem to be designed for people who work with very large numbers of quantities (rows in the data frame). – pglpm Jul 01 '21 at 10:51
13

Try:

showplot1 <- function(indata, inx, iny) {
    x <- names(indata)[inx] 
    y <- names(indata)[iny] 
    p <- ggplot(indata, aes_string(x = x, y = y))
    p + geom_point(size=4, alpha = 0.5)
}

Edited to show what's happening - aes_string uses quoted arguments, names gets them using your numbers.

Arun
  • 116,683
  • 26
  • 284
  • 387
alexwhan
  • 15,636
  • 5
  • 52
  • 66
  • sorry @alexwhan, it's not very clear for me - could you explain a little bit more? Thanks! – Vasily A Mar 10 '13 at 14:47
  • First version didn't actually answer your question - try the edit – alexwhan Mar 10 '13 at 14:48
  • @alexwhan, please check your answers before posting. I've made an edit each to both of your answers. – Arun Mar 10 '13 at 15:04
  • the very last version works for this example (the previous edit, with `aes` instead of `aes_string`, did not work). Although it doesn't work for my real data because my table have hyphens in the names which makes error on processing: for example, if the column was called `someName-one`, I get an error `Error in eval(expr, envir, enclos) : object 'someName' not found`. At the same time, this names' bug does not make any problem when I use your solution with `environment()`, so it is still preferable for me. – Vasily A Mar 10 '13 at 15:17
  • @VasilyA, the very last version is my edit. Yes, I thought of this problem with unusual names. But since you dint post here, I dint think it was your case. I'm glad you wrote it here, it'll help others. – Arun Mar 10 '13 at 15:29
  • @Arun - you're right, I started answering then my battery went flat... Had to follow up on phone without R... – alexwhan Mar 10 '13 at 15:30
  • @VasilyA, make sure to accept Arun's answer (and upvote it) since it solved your problem! – alexwhan Mar 10 '13 at 15:32
  • I did accept his answer already :) But I am grateful to your answer as well because it helped me to understand better some aspects of using aes. – Vasily A Mar 10 '13 at 15:38
10

A variation on @Shadow's answer using new features from ggplot2 V3.0.0 :

showplot <- function(indata, inx, iny){
  nms <- names(indata)
  x <- nms[inx]
  y <- nms[iny]
  p <- ggplot(indata, aes(x = !!ensym(x), y = !!ensym(y)))
  p + geom_point(size=4, alpha = 0.5)
}   

testdata <- data.frame(v1=rnorm(100), v2=rnorm(100), v3=rnorm(100), v4=rnorm(100), v5=rnorm(100))
names(testdata) <- c("a-b", "c-d", "e-f", "g-h", "i-j")
showplot(indata=testdata, inx=2, iny=3)

ensym creates a symbol from the string contained in a variable (so we first have to create those variables at the start of the function), then !! unquotes it, which means it will work as if you had fed the function raw names.

!! works only in the context of functions designed to support it, usually tidyverse functions, else it just means "not not" (similar to as.logical)..

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • Thank you Antoine! If you will have some time, maybe you could add just couple words explaining how this works? (for other not-so-advanced users like myself) If not, here's my understanding: `!!` operator - pronounced '_bang-bang'_ - unquotes names previously quoted by `ensym()`, it's part of _quasiquotation_ which is described in more details, for example, [here](https://adv-r.hadley.nz/quasiquotation.html). (I think it's worth explaining because it took me quite some time to figure out; googling '!!' obviously doesn't work.) – Vasily A Nov 07 '18 at 04:09
2

For completeness, I think it's safer to use column names instead of indices because column positions within a data frame can be changed causing unexpected results.

The plot_duo function below (taken from this answer) can use input either as strings or bare column names

library(rlang)
library(purrr)
library(dplyr)
library(ggplot2)

theme_set(theme_classic(base_size = 14))
set.seed(123456)
testdata <- data.frame(v1 = rnorm(100), v2 = rnorm(100), v3 = rnorm(100), 
                       v4 = rnorm(100), v5 = rnorm(100))

plot_duo <- function(df, plot_var_x, plot_var_y) {

  # check if input is character or bare column name to 
  # use ensym() or enquo() accordingly
  if (is.character(plot_var_x)) {
    print('character column names supplied, use ensym()')
    plot_var_x <- ensym(plot_var_x)
  } else {
    print('bare column names supplied, use enquo()')
    plot_var_x <- enquo(plot_var_x)
  }

  if (is.character(plot_var_y)) {
    plot_var_y <- ensym(plot_var_y)
  } else {
    plot_var_y <- enquo(plot_var_y)
  }

  # unquote the variables using !! (bang bang) so ggplot can evaluate them
  pts_plt <- ggplot(df, aes(x = !! plot_var_x, y = !! plot_var_y)) + 
    geom_point(size = 4, alpha = 0.5)

  return(pts_plt)
}

Apply plot_duo function across columns using purrr::map()

### use character column names
plot_vars1 <- names(testdata)
plt1 <- plot_vars1 %>% purrr::map(., ~ plot_duo(testdata, .x, "v1"))
#> [1] "character column names supplied, use ensym()"
#> [1] "character column names supplied, use ensym()"
#> [1] "character column names supplied, use ensym()"
#> [1] "character column names supplied, use ensym()"
#> [1] "character column names supplied, use ensym()"

str(plt1, max.level = 1)
#> List of 5
#>  $ :List of 9
#>   ..- attr(*, "class")= chr [1:2] "gg" "ggplot"
#>  $ :List of 9
#>   ..- attr(*, "class")= chr [1:2] "gg" "ggplot"
#>  $ :List of 9
#>   ..- attr(*, "class")= chr [1:2] "gg" "ggplot"
#>  $ :List of 9
#>   ..- attr(*, "class")= chr [1:2] "gg" "ggplot"
#>  $ :List of 9
#>   ..- attr(*, "class")= chr [1:2] "gg" "ggplot"

# test plot
plt1[[3]]

### use bare column names
# Ref: https://stackoverflow.com/a/49834499/
plot_vars2 <- rlang::exprs(v2, v3, v4)
plt2 <- plot_vars2 %>% purrr::map(., ~ plot_duo(testdata, .x, rlang::expr(v1)))
#> [1] "bare column names supplied, use enquo()"
#> [1] "bare column names supplied, use enquo()"
#> [1] "bare column names supplied, use enquo()"

str(plt2, max.level = 1)
#> List of 3
#>  $ :List of 9
#>   ..- attr(*, "class")= chr [1:2] "gg" "ggplot"
#>  $ :List of 9
#>   ..- attr(*, "class")= chr [1:2] "gg" "ggplot"
#>  $ :List of 9
#>   ..- attr(*, "class")= chr [1:2] "gg" "ggplot"

plt1[[2]]

Created on 2019-02-18 by the reprex package (v0.2.1.9000)

Tung
  • 26,371
  • 7
  • 91
  • 115
2

The aes_() and aes_quote() approaches are now soft deprecated. An easy way that is also consistent with quasiquotation is to call the column names via .data[[col_name]]. You can easily extract these based on position. For example:

library(ggplot2)
library(dplyr)

showplot1<-function(indata, inx, iny){
  dat<-indata
  col_names <- names(indata)
  col_name_x <- col_names[[inx]]
  col_name_y <- col_names[[iny]]
  
  print(nrow(dat)); # this is just to show that object 'dat' is defined
  p <- ggplot(dat, aes(x=.data[[col_name_x]], y=.data[[col_name_y]]))
  p + geom_point(size=4, alpha = 0.5)
}

testdata<-data.frame(v1=rnorm(100), v2=rnorm(100), v3=rnorm(100), v4=rnorm(100), v5=rnorm(100))
showplot1(indata=testdata, inx=2, iny=3)
#> [1] 100

Created on 2021-09-22 by the reprex package (v2.0.0)

Bryan Shalloway
  • 748
  • 7
  • 15
1

As a supplement to @moodymudskipper's answer, if you want to use it in a magrittr pipe, you can use

testdata %>% 
    ggplot(aes(x=!!sym(names(.)[2]), y=!!sym(names(.)[3]))) +
    geom_point(size=4, alpha = 0.5)

Also, if you want to use aes in a specific layer only, you can use

testdata %>% 
    {ggplot(., aes(x=!!sym(names(.)[2]))) +
    geom_point(aes(y=!!sym(names(.)[3])), size=4, alpha = 0.5)}

Certainly, you can wrap it into a function if you need:

showplot <- function(indata, inx, iny){
    indata %>% 
        ggplot(aes(x=!!sym(names(.)[inx]), y=!!sym(names(.)[iny]))) +
        geom_point(size=4, alpha = 0.5)
}

Although this would actually lose the point of using a pipe

FaniX
  • 56
  • 1
  • 5
0

provisional solution I found for the moment:

showplot1<-function(indata, inx, iny){
  dat<-data.frame(myX=indata[,inx], myY=indata[,iny])
  print(nrow(dat)); # this is just to show that object 'dat' is defined
  p <- ggplot(dat, aes(x=myX, y=myY))
  p + geom_point(size=4, alpha = 0.5)
}

But I don't really like it because in my real code, I need other columns from indata and here I will have to define all of them explicitly in dat<-...

Vasily A
  • 8,256
  • 10
  • 42
  • 76