how to pass the argument of a user-defined function to be a column name in data.table?

Question

How can I pass an argument to be a column name of data.table in side the function? For example, I have a data called data1 with columns called 'hours' and 'location'. In the output, I want to find the outliers by location and named by 'hours'. I tried use substitute(y) and so forth, The output always uses 'y' as the column name. Could anyone help me? Thank you.

mf<-function(data, y){
newy<-as.name(deparse(substitute(y)))
output<-data[,.(y=boxplot.stats(eval(newy))$out),by=.(location)]
return(output)
}
mf(data=data1,y=hours)

See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 regarding how to make a good example. — Frank, Aug 11 '17 at 16:43

score 0 · Answer 1 · answered Aug 11 '17 at 16:58

It's better to write functions which take character values for choosing columns. In this case, your function can be rewritten as:

mf <- function(data, y){
  output <- data[, boxplot.stats(get(y))['out'], by = .(location)]
  setnames(output, 'out', y)
  return(output)
}

By using [ to subset the output of boxplot.stats, a named list with one element ('out') is returned. So output will have two columns: location and out. Then you just need to change out to be whatever was given for y.

Example:

set.seed(100)
data1 <- data.table(
  location = state.name,
  hours    = rpois(1000, 12)
)
mf(data = data1, y = 'hours')
#           location hours
#  1:       Delaware    25
#  2:        Georgia    21
#  3:          Idaho     4
#  4:  Massachusetts     5
#  5:       Missouri     7
#  6: South Carolina     5
#  7: South Carolina     6
#  8:   South Dakota    20
#  9:          Texas     5
# 10:           Utah    22

Non-standard evaluation is tricky and only worth the effort if you can get something out of it. data.table uses it for optimization behind the scenes. tidyverse packages use it to allow in-database processing. If there's no benefit (besides not having to type a few quotation marks), there's only a cost.

Thank you Nathan Werth, your code works. Appreciate you pointing out the eval() trouble. The example here is simplified and only has location and hours. In my real code, there is one more column--the endpoint of whisker (boxplot(hours)$stats[5]). As a result, I wasn't able to use ['out']. But I managed to do boxplot(hours)$out and use setnames(output, 'V1',y) to rename V1 columns. Thank you. — xyx, Aug 11 '17 at 20:35
BTW, thank you for providing the solution to this issue. I just feel a little uncommon to include quotation marks to refer to a column in R function arguments. — xyx, Aug 11 '17 at 20:43

how to pass the argument of a user-defined function to be a column name in data.table?

1 Answers1