10

I use the following idiom for conditionally selecting columns from a data.frame:

DF = data.frame(a = 1:3,b = letters[1:3],c = LETTERS[1:3])
someCondition <- FALSE

# use `if(someCondition)` to conditionally include column 'c'
DF[,c('a','b',if(someCondition)'c')] 
:>   a b
:> 1 1 a
:> 2 2 b
:> 3 3 c

but the equivalent does not work with data.table's b/c NULL values are not dropped from lists the same way they are dropped from concatenation:

DT = as.data.table(DF)
DT[,.(a,b,if(someCondition)c)]
:> Error in setnames(jval, jvnames) : 
:>   Can't assign 3 names to a 2 column data.table

I've defined a function called .. which is a work around:

.. <- function(...){
    x = list(...)
    x= x[!sapply(x,is.null)]
    x
}
DT[,..(a,b,if(someCondition)c)]
:>    V1 V2
:> 1:  1  a
:> 2:  2  b
:> 3:  3  c

but it seeks kind of kludgy to have to include my own function to accomplish an operation that is so common. Is there a more idiomatic way of conditionally selecting columns from a data.table?

Jthorpe
  • 9,756
  • 2
  • 49
  • 64
  • 7
    `\`[.noquote\`(DT, c('a','b', if (someCondition) 'c'))` or `DT[, c('a','b', if (someCondition) 'c'), with=FALSE]` if you can't appreciate the wonders of `[.noquote`. – Frank Oct 26 '15 at 17:49
  • 2
    Not sure why this is marked as duplicate.. Please file a bug report. – Arun Oct 26 '15 at 17:56
  • @akrun, clearly `DT[, .(a, b, if(someCondition) c)]` doesn't work as it should. Why would it be a duplicate? – Arun Oct 26 '15 at 18:04
  • @Jthorpe, existence of an alternate approach doesn't make this issue go away. – Arun Oct 26 '15 at 18:04
  • Note that `DT[, .(a, b, if(TRUE) c)]` does work, but without the name on `c` – Rich Scriven Oct 26 '15 at 18:05
  • @akrun, clearly the solution is *quite different*. – Arun Oct 26 '15 at 18:05
  • 1
    @Arun I don't know that it *should* work. Should the list in `j` behave differently from other lists by dropping NULL elements? (I'm looking at `list("a","b", if(FALSE) "c")`.) Seems like it could lead to unexpected behavior in other use cases (that I can't think of...). Anyway, I see Jason has posted the bug report, so I'll discuss there if I think of anything more. – Frank Oct 26 '15 at 18:15
  • 1
    There's nothing different here. NULL elements could/should be ignored. It's already happening in operations of the form `DT[, if (condition) .SD, by=.]`, for example. On unexpectedness, we've tests to catch those cases. We'll address it if they pop up. – Arun Oct 26 '15 at 18:20
  • 1
    @Arun -- That's absolutely the key point. Also worth reminding folks that the `with=FALSE` alternative really does not provide a drop-in replacement for `with=TRUE`. Something as simple as this of course fails: `DT[, c(3*'a', 2*'b', if (someCondition) 'c'), with=FALSE]`. (For the record, and in case it matters to anybody, I'm the one who reversed the closure.) – Josh O'Brien Oct 26 '15 at 18:25
  • @JoshO'Brien, absolutely! Thanks for adding that point in. – Arun Oct 26 '15 at 18:27
  • @arun Does my proposed solution run against any data.table rules of style or the like? perhaps unnecessary copying? – lmo Apr 09 '16 at 18:43
  • @lmo, looks great! Selecting columns copies in data.table currently, doesn't matter with `.SD` or `with=FALSE` (and it is that way because data.tables are designed to not do a lot of *select*). When `shallow()` will be exported, these operations will get even more efficient. But I won't be working on it anytime soon.. (especially since Rv3.3.0 does better reference counting IIUC from what I'v heard).. – Arun Apr 10 '16 at 09:16

1 Answers1

4

I think the .SDcols argument Does what you want. In the above example for data.table DF,

DF[, .SD, .SDcols= c("a","b", if(someCondition) "c")]

Will act in the same manner as in your data.frame. You can also implement this as in the below example.

DF[, .SD, .SDcols=if(someCondition) c("a","b","c") else c("a","b")]

will perform the desired selection. In a previous line, you could set up more elaborate constructions of the true and false vectors (which might defeat the purpose of keeping thing succinct).

lmo
  • 37,904
  • 9
  • 56
  • 69
  • .SDcols does not accept x.colname and i.colname when we have a non-equi join. Is that a bug? – Lazarus Thurston Jul 06 '22 at 16:41
  • 1
    There are some limitations to .SDcols. What you mention is not a bug, per say, but adding that as a feature, might be pretty cool. You could suggest this on their Github page if you want as `data.table` is still (thankfully) in active development. – lmo Oct 01 '22 at 16:06