R - subset data frame column using names in vector list

Question

Is it possible to subset data frame columns (into new df) using column names stored in the array of column names - like in c("col1", "col9", "col6")? I know I can reference one column in df using df[[colname]] syntax but it does not let me do it for multiple columns:

df
   X1 X2 X3
1:  a  1  3
2:  b  5  3
3:  a  3  4
4:  c  6  5
5:  c  2  2

cnm<-c("X2","X3")

df[[cnm]]

Error in .subset2(x, i, exact = exact) : subscript out of bounds

thanks

thanks - first one works but requires converting data frame into table ... second one did not work when I tried: > cnm<-c("X2","X3") > df[cnm] Error in `[.data.table`(df, cnm) : When i is a data.table (or character vector), x must be keyed (i.e. sorted, and, marked as sorted) so data.table knows which columns to join to and take advantage of x being sorted. Call setkey(x,...) first, see ?setkey. — Zoran Krunic, Sep 14 '16 at 18:59
Second one will not work because your dataset is `data.table` — akrun, Sep 14 '16 at 19:00

akrun · Answer 1 · 2016-09-14T19:03:08.537

Based on the OP's dataset, it seems like a data.table. For subsetting columns in data.table, we need with = FALSE

df[, cnm, with = FALSE]
#   X2 X3
#1:  1  3
#2:  5  3
#3:  3  4
#4:  6  5
#5:  2  2

According to the ?data.table documentation

with - By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables.

When with=FALSE j is a character vector of column names, a numeric vector of column positions to select or of the form startcol:endcol, and the value returned is always a data.table. with=FALSE is often useful in data.table to select columns dynamically. Note that x[, cols, with=FALSE] is equivalent to x[, .SD, .SDcols=cols].

If the dataset is data.frame, just

setDF(df)#convert to 'data.frame'
df[cnm]
#   X2 X3
#1  1  3
#2  5  3
#3  3  4
#4  6  5
#5  2  2

will subset the dataset

The [[ is for extracting a single column of data.frame or list element

Applying the OP's code in a data.table gets the same error message

df[[cnm]]

Error in .subset2(x, i, exact = exact) : subscript out of bounds

If we do the data.frame subsetting option in data.table, it will not work either

df[cnm]

Error in [.data.table(df, cnm) : When i is a data.table (or character vector), the columns to join by must be specified either using 'on=' argument (see ?data.table) or by keying x (i.e. sorted, and, marked as sorted, see ?setkey). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.

thanks - I missed setDF(df) ... – Zoran Krunic Sep 14 '16 at 19:01 — Zoran Krunic, Sep 14 '16 at 19:01

R - subset data frame column using names in vector list

1 Answers1