I'm trying to use data.table, but finding the syntax extremely clunky to work with.
In the app I'm working on, I need to be able to loop through various features and find the unique column name corresponding to each one. (That's fine. I can pull it from a csv lookup file and store it as a string in each iteration of the loop). What I'm having trouble with is referencing the variable dynamically within the data.table syntax when I need to perform some sort of operation on it (i.e. sum, mean etc.)
In this particular instance, I need to calculate the mean of a given column. I've made a dumbed down example using the iris dataset. I need to be able to reference the column name via a variable. In this case I want to find the average of the Sepal.Width column without referencing it directly, but instead via the variable iris_var.
# The name of the column that we're interested in, stored as a string.
iris_var <- 'Sepal.Width'
# Data table.
iris_data <- as.data.table(iris)
# This works. Including "with=FALSE" allows the code to recognise Sepal.Width as the column name rather than a string.
iris_data[,iris_var, with=FALSE]
# This calculates the mean of Sepal.Width by referring to it directly. However, I need to be able to get this by referring to Sepal.Width dynamically via iris_var.
iris_data[,.(average=mean(Sepal.Width))]
# This does not work.
iris_data[,.(average=mean(iris_var)), with=FALSE]
The last line produces the following error:
Error in `[.data.table`(iris_data, , .(average = mean(iris_var)), with = FALSE) :
When with=FALSE, j-argument should be of type logical/character/integer indicating the columns to select.
In addition: Warning message:
In mean.default(iris_var) :
argument is not numeric or logical: returning NA
Can anyone tell me what I'm doing wrong?