10

I would like to pass a data frame and its columns to be processed by dplyr's mutate within a function.

Here is an example

multifun <- function(dataf,vari){
 mutate(dataf,newvar=vari*2)
}

multifun(mtcars,gear)

The problem with this function is that the variable 'gear' is not a recognized object. More specifically I get the error

Error in mutate_impl(.data, named_dots(...), environment()) object 'gear' not found

This is a problem with the environment where dplyr's mutate is looking for the variable in question.

I understand that

multifun(mtcars,mtcars$gear)

will give me the answer that I want, namely

    mpg  cyl  disp  hp   drat  wt   qsec  vs am   gear carb newvar
1  21.0   6   160.0 110  3.90 2.620 16.46  0  1    4    4      8
2  21.0   6   160.0 110  3.90 2.875 17.02  0  1    4    4      8
3  22.8   4   108.0  93  3.85 2.320 18.61  1  1    4    1      8

but I would like to see if there is a way of avoiding the need to reference each variable used from the data frame in the function call.

I am also aware that taking mutate out of the function call works without problems. Namely, mutate(mtcars,newvar=gear*2) does the job. However, I am trying to understand how dplyr's mutate is looking for the variable in question in the different environments when placed inside a function.

eli-k
  • 10,898
  • 11
  • 40
  • 44
Robert
  • 103
  • 1
  • 6
  • See if the solutions suggested here help at all http://stackoverflow.com/questions/21815060/dplyr-how-to-use-group-by-inside-a-function – konvas Jul 07 '14 at 09:02

4 Answers4

4

This is really ugly to me, but seems to work. Basically, I tried using get but it didn't seem to know where to look, so I specified the environment.

multifun <- function(dataf, vari){
  vari <- deparse(substitute(vari))
  mutate(dataf, newvar = get(vari, envir = as.environment(dataf)) * 2)
}

Output:

multifun(mtcars, gear)
#                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb newvar
# Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4      8
# Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4      8
# Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1      8
# <<<SNIP>>>
# Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6     10
# Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8     10
# Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2      8
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
4

Looking at @Anandas solution, this seems to be simplest hack

multifun <- function(dataf, vari){   
dataf <- mutate(dataf, newvar = dataf[, vari]*2);   
return(dataf) 
}

multifun(mtcars,"gear")

Again, taking incount @Anandas suggestion, you could also do

multifun <- function(dataf, vari){  
  vari <- deparse(substitute(vari))
  dataf <- mutate(dataf, newvar = dataf[, vari]*2)   
  return(dataf) 
}

multifun(mtcars, gear)
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • 2
    You can also add a line `vari <- deparse(substitute(vari))` if you wanted to use an unquoted value as input for the second argument. I don't know if I would consider this a hack. To me, it's more readable than mine--I was just posting an answer to see if it helped me to understand the `environment()` part of the error. – A5C1D2H2I1M1N2O1R2T1 Jul 07 '14 at 10:03
  • This indeed seems to be the simplest. Still it seems that all the solutions result in verbose code if instead of having one variable we have many of them, or if these variables appear in a complex code. Thanks anyway for all the answers. – Robert Jul 07 '14 at 19:53
  • It seems then that if the code is complex or it involves many variables from the data frame the least verbose solution might be to specify the source of the variables in the function call as indicated above, i.e. `multifun(mtcars,mtcars$gear)`, or plainly "hard coding" the gear variable in the function definition as in [this](http://stackoverflow.com/questions/24459752/can-dplyr-package-be-used-for-conditional-mutating?rq=1) post – Robert Jul 07 '14 at 20:04
3

With dplyr 0.7.0, this can now be done with tidyeval:

multifun <- function(dataf,vari){
  mutate(dataf,newvar = UQ(enquo(vari))*2)
}

multifun(mtcars,gear)

enquo quotes the symbol referring to the function argument and bundles it with the environment in which the function is being called into a quosure. UQ or !! can then be used to unquote the quosure and evaluate it immediately within mutate.

acylam
  • 18,231
  • 5
  • 36
  • 45
2

Or

multifun1 <- function(dataf, vari){
eval(substitute(mutate(dataf, newvar=vari*2), list(vari=as.name(vari))))
}

multifun1(mtcars,"gear") 

To use unquoted value, it would be better to use @Ananda Mahto's suggestion

multifun1 <- function(dataf, vari){
vari <- deparse(substitute(vari))
eval(substitute(mutate(dataf, newvar=vari*2), list(vari=as.name(vari))))
}

multifun1(mtcars,gear)
Robert
  • 103
  • 1
  • 6
akrun
  • 874,273
  • 37
  • 540
  • 662