Pull a subset of data using a variable reference to a column name in R

Question

I want to do the following:
If say I am working with the iris data which is of class data.frame and I store a column name into a variable col <- "Species" and I want to pull the following subset:

iris[iris$Petal.Width == 0.2, c("Sepal.Width", "Petal.Width", col)]

The code works and returns a table as expected. However, if I convert the data to a data.table and run the same line of code, I get just the column names returned instead of the subset. Like this:

iris[iris$Petal.Width == 0.2, c("Sepal.Width", "Petal.Width", col)]
[1] "Sepal.Width" "Petal.Width" "Species"

How would I change the notation to get the same result from a data.table?

Use `as.data.table(iris)[Petal.Width == 0.2, c("Sepal.Width", "Petal.Width", col), with = F]` which is short for `as.data.table(iris)[Petal.Width == 0.2, .SD, .SDcols = c("Sepal.Width", "Petal.Width", col)]` — Maurits Evers, Jul 25 '18 at 22:24
I'm not sure this is a duplicate. This is asking how to access a subset of columns using a mixture of character values and R-names. The cited response, admittedly from the most authoritative possible source, didn't tell me how one was supposed to do that. Your comment might better be convert to an answer or you might be able to find a better duplicate. — IRTFM, Jul 25 '18 at 22:34
@42- Ok, fair enough. I've re-opened, and included my comment as an answer. I'll look for a better dupe fit. — Maurits Evers, Jul 25 '18 at 23:19

Maurits Evers · Accepted Answer · 2018-07-25T23:21:51.477

I still think this is somewhat of a duplicate of question "Select / assign to data.table variables which names are stored in a character vector", but while I look for a better fit, let's address the question.

You can use with = F

col <- "Species"
as.data.table(iris)[Petal.Width == 0.2, c("Sepal.Width", "Petal.Width", col), with = F]
#Sepal.Width Petal.Width Species
#1:         3.5         0.2  setosa
#2:         3.0         0.2  setosa
#3:         3.2         0.2  setosa
#4:         3.1         0.2  setosa
#5:         3.6         0.2  setosa
#6:         3.4         0.2  setosa
#...

which is the same as

as.data.table(iris)[Petal.Width == 0.2, .SD, .SDcols = c("Sepal.Width", "Petal.Width", col)]

From the ?data.table documentation

with: By default ‘with=TRUE’ and ‘j’ is evaluated within the frame of ‘x’; column names can be used as variables.

When ‘with=FALSE’ ‘j’ is a character vector of column names, a numeric vector of column positions to select or of the form ‘startcol:endcol’, and the value returned is always a ‘data.table’. ‘with=FALSE’ is often useful in ‘data.table’ to select columns dynamically. Note that ‘x[, cols, with=FALSE]’ is equivalent to ‘x[, .SD, .SDcols=cols]’.

[Bold emphasis is mine]

Fyi, they may eventually turn off with=FALSE, "Thus, with= should no longer be needed in any cases. Please change to using the .. prefix and over the next few years we will start to formally deprecate and remove the with= parameter." from the NEWS https://github.com/Rdatatable/data.table/blob/master/NEWS.md — Frank, Jul 26 '18 at 14:20

Pull a subset of data using a variable reference to a column name in R

1 Answers1