If you start to write functions with variable names with arguments, you might find data.table
more convenient than dplyr
. I recently wrote a post on the subject. Standard evaluation is easier to handle with data.table
than dplyr
, in my opinion.
With data.table
, you have several ways to use column names as argument
Using get
You can use get
that maps a name with a value in a certain scope. Here the scope is your data.table
:
library(data.table)
funtest <- function(dat,var,newvar){
dat[, (newvar) := get(var)]
}
:=
is an update-by-reference operator. If you want to know more about it, data.table
vignettes are a good place to start. Calling the function:
dt = data.table(iris)
funtest(dt, "Species","x")[]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species x
1: 5.1 3.5 1.4 0.2 setosa setosa
2: 4.9 3.0 1.4 0.2 setosa setosa
3: 4.7 3.2 1.3 0.2 setosa setosa
4: 4.6 3.1 1.5 0.2 setosa setosa
5: 5.0 3.6 1.4 0.2 setosa setosa
---
146: 6.7 3.0 5.2 2.3 virginica virginica
147: 6.3 2.5 5.0 1.9 virginica virginica
148: 6.5 3.0 5.2 2.0 virginica virginica
149: 6.2 3.4 5.4 2.3 virginica virginica
150: 5.9 3.0 5.1 1.8 virginica virginica
Using .SD
You can also use .SD
that means Subset of Data. This is more convenient when you have several variables quoted. It avoids the !!!rlang::sym
necessary for dplyr
.
You can find yourself making complicated computations with a very concise syntax:
df[, newcolnames := lapply(.SD, mean), by = grouping_var, .SDcols = xvars]