18

Is it possible to use dplyr's mutate function without hard-coding the variable names? For example, the following code works, because I hard-code the name Var1:

            > d=expand.grid(1:3, 20:22)
            > d
            Var1 Var2
            1    1   20
            2    2   20
            3    3   20
            4    1   21
            5    2   21
            6    3   21
            7    1   22
            8    2   22
            9    3   22
            > d=mutate(d, x=percent_rank(Var1))
            > d
            Var1 Var2     x
            1    1   20 0.000
            2    2   20 0.375
            3    3   20 0.750
            4    1   21 0.000
            5    2   21 0.375
            6    3   21 0.750
            7    1   22 0.000
            8    2   22 0.375
            9    3   22 0.750

However, when I make the variable's name a variable, it no longer works:

            > my.variable='Var1'
            > d=mutate(d, x=percent_rank(my.variable))
            > d
                Var1 Var2   x
            1    1   20 NaN
            2    2   20 NaN
            3    3   20 NaN
            4    1   21 NaN
            5    2   21 NaN
            6    3   21 NaN
            7    1   22 NaN
            8    2   22 NaN
            9    3   22 NaN

The eval() and as.symbol() functions don't seem to help, either.

Robert Bray
  • 393
  • 3
  • 7
  • why do you want to do this? You are setting x equal to the `percent_rank` of a character string. What else would you expect to happen? – rawr Feb 25 '14 at 06:07

3 Answers3

10

The great Hadley Wickham himself (hallowed be his name!) suggested this on the mutatr Google Groups:

d <- expand.grid(1:3, 20:22)
my.variable <- 'Var1'
percent_rank <- function(x) rank(x)/max(rank(x))
call <- substitute(mutate(d, percent_rank(var)), 
                   list(var = as.name(my.variable)))
eval(call)
#   Var1 Var2 percent_rank(Var1)
# 1    1   20              0.250
# 2    2   20              0.625
# 3    3   20              1.000
# 4    1   21              0.250
# 5    2   21              0.625
# 6    3   21              1.000
# 7    1   22              0.250
# 8    2   22              0.625
# 9    3   22              1.000
fabians
  • 3,383
  • 23
  • 23
  • 1
    +1, although I would prefer a solution which does not include `eval`. But, hey, if it is good enough for Hadley :). – Paul Hiemstra Feb 25 '14 at 12:02
  • why u no like `eval`? – fabians Feb 25 '14 at 12:17
  • The main point is that `eval` is normally not needed, and less obscure solutions are present. See also http://stackoverflow.com/questions/13649979/what-specifically-are-the-dangers-of-evalparse. – Paul Hiemstra Feb 25 '14 at 13:16
  • I agree as far as things like `eval(parse(text=""))` are concerned, but I don't think avoiding `eval` is possible when you programmatically construct a `call`. – fabians Feb 25 '14 at 13:45
  • It is just that using eval to construct the call is a bit of a hack, native support for using strings would be preferable. – Paul Hiemstra Feb 25 '14 at 15:33
  • Agreed. It would be much nicer if `dplyr` could deal with this type of thing without forcing us to resort to this kind of "computing on the language". I'm sure that will come though, it's still in beta. – fabians Feb 25 '14 at 15:46
  • Yes, it's on the to do list. In the interim, I much prefer this sort of solution to using get with a custom environment. It's parse that's the problem, not eval. – hadley Feb 25 '14 at 21:56
5

You can use get and precise the environment in which the object "Var1" is.

> my.variable = 'Var1'
> mutate(d, x = percent_rank(get(my.variable, envir = as.environment(d))))
  Var1 Var2     x
1    1   20 0.000
2    2   20 0.375
3    3   20 0.750
4    1   21 0.000
5    2   21 0.375
6    3   21 0.750
7    1   22 0.000
8    2   22 0.375
9    3   22 0.750

I suggest you to read more about "non-standard evaluation" on the "Advanced R programming" wiki by Hadley Wickham : http://adv-r.had.co.nz/Computing-on-the-language.html

EDIT

This answer was recently voted so I realized that the solution I gave a year and a half ago was not really great and I take this opportunity to upgrade my answer.

Since dplyr 0.3 you can use standard evaluation version of dplyr's functions, using their "fun_" versions.

Also you have to use interp from lazyeval package if you are doing some computations on the variables :

my.variable = "Var1"
expr <- lazyeval::interp(~percent_rank(x), x = as.name(my.variable))
mutate_(d, .dots = setNames(list(expr), "x"))
Var1 Var2     x
1    1   20 0.000
2    2   20 0.375
3    3   20 0.750
4    1   21 0.000
5    2   21 0.375
6    3   21 0.750
7    1   22 0.000
8    2   22 0.375
9    3   22 0.750
Community
  • 1
  • 1
Julien Navarre
  • 7,653
  • 3
  • 42
  • 69
3

In the devel version of dplyr (awaiting new release 0.6.0), with the introduction of quosures and unquote functions (!!, UQ) to evaluate the quotes in group_by/summarise/mutate, this becomes more easier

 my.variable <- quo(Var1)
 percent_rank <- function(x) rank(x)/max(rank(x))
 d %>% 
   mutate(x = percent_rank(!!my.variable))
#  Var1 Var2     x
#1    1   20 0.250
#2    2   20 0.625
#3    3   20 1.000
#4    1   21 0.250
#5    2   21 0.625
#6    3   21 1.000
#7    1   22 0.250
#8    2   22 0.625
#9    3   22 1.000

It also has other features to pass column names

mynewvar <- 'x'
d %>% 
   mutate(!!mynewvar := percent_rank(!!my.variable))
#  Var1 Var2     x
#1    1   20 0.250
#2    2   20 0.625
#3    3   20 1.000
#4    1   21 0.250
#5    2   21 0.625
#6    3   21 1.000
#7    1   22 0.250
#8    2   22 0.625
#9    3   22 1.000

We can also create a function and pass the argument

f1 <- function(dat, myvar, colN){
  myvar <- enquo(myvar)
  colN <- quo_name(enquo(colN))
 
  dat %>%
      mutate(!!colN := percent_rank(!!myvar))
 }

f1(d, Var1, x)
#  Var1 Var2     x
#1    1   20 0.250
#2    2   20 0.625
#3    3   20 1.000
#4    1   21 0.250
#5    2   21 0.625
#6    3   21 1.000
#7    1   22 0.250
#8    2   22 0.625
#9    3   22 1.000

In the above function, enquo does the similar functionality as substitute from base R in taking the user input arguments and converting it to quosure. As we need column name in string, we can use quo_name to do the conversion to string and the evaluation inside the mutate call is done by unquoting (!! or UQ)

data

d <- expand.grid(1:3, 20:22)
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662