SO #24833247 covers nearly all the use cases for passing column names dynamically to a data.table
within a function. However it misses one I'm currently trying to address: passing variables to the i
expression.
I'm trying to refactor some data cleansing code to a function that converts certain values to NA
after I've pulled the data into a data.table
For example, given the following:
dt <- data.table(colA = c('A', 'b', '~', 'd', ''), colB = c('', '?', 'a1', 'a2', 'z4'))
dt[colA %in% c('~', ''), colA := NA]
dt[colB %in% c('~', ''), colB := NA]
I want a generic function that replaces the '~'
, '?'
and ''
values with NA
, instead of having to explicitly code each transformation.
dt <- data.table(colA = c('A', 'b', '~', 'd', ''), colB = c('', '?', 'a1', 'a2', 'z4'))
clearCol(dt, colA)
clearCol(dt, colB)
The j
expression is straight-forward
clearCol <- function(dt, f) {
f = substitute(f)
dt[,(f) := NA]
}
clearCol(data.table(colA = c('A', 'b', '~', 'd', '',)), colA)[]
x
1: NA
2: NA
3: NA
4: NA
5: NA
However, extending it to add the variable to the i
expression fails:
clearCol <- function(dt, f) {
f = substitute(f)
dt[(f) %in% c('~', ''),(f) := NA]
}
clearCol(data.table(colA = c('A', 'b', '~', 'd', '')), colA)[]
Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments
Swapping to this seems to work, but the lack of output with verbose = TRUE
(compared to the hard-coded method at the top) leaves me concerned that it will not scale well when given the large data sets I'm working with
clearCol <- function(dt, f) {
f = deparse(substitute(f))
dt[get(f) %in% c('~', ''),(f) := NA]
}
clearCol(data.table(colA = c('A', 'b', '~', 'd', '')), colA)[]
colA
1: A
2: b
3: NA
4: d
5: NA
Is there another way of doing what I want?