1

Let's say I want to create a function that replicates a column of choice, for the sake of an example.

testdata <- data.frame(
  "diff1" = c(seq(1:10)),
  "diff2" = c(seq(21:30))
)

goal <- testdata %>%
  mutate(newdiff1 = diff1)

So I create a function

funtest <- function(dat,var,newvar){
  dat %>%
    mutate(newvar = var)
}

however,

test2 <- funtest(testdata,diff1,newdiff1)

would return an error:

 Error: object 'diff1' not found 

This format works

nondesiredformat <- funtest(testdata,testdata$diff1,newdiff1)

but this will cause the new variable to be always called "newvar", instead of our third argument.

is there a way to change the function so the arguments in test2 may work?

Thank you

MrFlick
  • 195,160
  • 17
  • 277
  • 295
aiorr
  • 547
  • 4
  • 11

3 Answers3

1

In the function, we can use {{}} for doing the evaluation i.e. !! + enquo for unquoted variable names passed into function and for assignment, use the := instead of =

funtest <- function(dat,var,newvar){
   dat %>%
     mutate({{newvar}} := {{var}})
        }
funtest(testdata, diff1, newdiff1)
#    diff1 diff2 newdiff1
#1      1     1        1
#2      2     2        2
#3      3     3        3
#4      4     4        4
#5      5     5        5
#6      6     6        6
#7      7     7        7
#8      8     8        8
#9      9     9        9
#10    10    10       10
akrun
  • 874,273
  • 37
  • 540
  • 662
1

you can use bquote for this:

eval(bquote(
  dat %>% 
    mutate(.(newvar) := .(var))
))

you could also update old school in your particular case

dat[[newvar]] = dat[[var]]
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
1

If you start to write functions with variable names with arguments, you might find data.table more convenient than dplyr. I recently wrote a post on the subject. Standard evaluation is easier to handle with data.table than dplyr, in my opinion.

With data.table, you have several ways to use column names as argument

Using get

You can use get that maps a name with a value in a certain scope. Here the scope is your data.table:

library(data.table)
funtest <- function(dat,var,newvar){
  dat[, (newvar) := get(var)]
}

:= is an update-by-reference operator. If you want to know more about it, data.table vignettes are a good place to start. Calling the function:

dt = data.table(iris)

funtest(dt, "Species","x")[]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species         x
  1:          5.1         3.5          1.4         0.2    setosa    setosa
  2:          4.9         3.0          1.4         0.2    setosa    setosa
  3:          4.7         3.2          1.3         0.2    setosa    setosa
  4:          4.6         3.1          1.5         0.2    setosa    setosa
  5:          5.0         3.6          1.4         0.2    setosa    setosa
 ---                                                                      
146:          6.7         3.0          5.2         2.3 virginica virginica
147:          6.3         2.5          5.0         1.9 virginica virginica
148:          6.5         3.0          5.2         2.0 virginica virginica
149:          6.2         3.4          5.4         2.3 virginica virginica
150:          5.9         3.0          5.1         1.8 virginica virginica

Using .SD

You can also use .SD that means Subset of Data. This is more convenient when you have several variables quoted. It avoids the !!!rlang::sym necessary for dplyr.

You can find yourself making complicated computations with a very concise syntax:

df[, newcolnames := lapply(.SD, mean), by = grouping_var, .SDcols = xvars]
linog
  • 5,786
  • 3
  • 14
  • 28