3

I have some data:

library(data.table)
data(mtcars)
setDT(mtcars)

I have some vectors of names of columns I would like to keep:

some_keep <- c('mpg','cyl')
more_keep <- c('disp','hp')

I want to drop-in-place every column except those named in some_keep and more_keep.

I know that I can use setdiff with names:

mtcars[, ( setdiff(names(mtcars),c(some_keep,more_keep)) ) := NULL] # this works

But this seems not very readable. I know that to select all but these, I can use with=FALSE:

mtcars[,-c(some_keep,more_keep), with=FALSE] # returns the columns I want to drop

But then this doesn't work:

mtcars[,(-c(some_keep,more_keep)):=NULL] # invalid argument to unary operator

Nor do these:

mtcars[,-c(some_keep,more_keep)] <- NULL # invalid argument to unary operator
mtcars[,-c(some_keep,more_keep), with=FALSE] <- NULL # unused argument (with = FALSE)

Is there a simpler data.table expression that doesn't require writing the table's name twice?

Please note that this seemingly duplicate question is really asking about selecting (not dropping) all but specified as shown above.

C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
  • I mentioned this on a similar FR a few months ago: https://github.com/Rdatatable/data.table/issues/1710#issuecomment-280684187 – Frank Aug 01 '17 at 20:21
  • 1
    Not dropping in place, but `mtcars[, .SD, .SDcols=c(some_keep, more_keep)]` returns the desired result. – lmo Aug 01 '17 at 20:24
  • What if we took your working code and made it readable: `drop_cols = setdiff(names(mtcars), c(some_keep, more_keep))` and then `mtcars[, drop_cols := NULL] `. – Gregor Thomas Aug 01 '17 at 20:31
  • Really, I think I'm just going to vote to close as opinion-based. You've got a single line solution that works well. You can make it more readable by breaking it into two lines. Neither "elegant" nor "readable" is an objective criteria for evaluating other methods. – Gregor Thomas Aug 01 '17 at 20:33
  • 3
    I agree with Gregor 's suggestion and do it that way myself. There's a typo in it, though: it should be like `(drop_cols) :=`. I have no opinion on close or leave open, though I think it *could* be made objective: "How can I select columns to keep by reference without writing the table's name twice?" The answer would probably be to create a wrapper function, `keepcols <- function(DT, k, d = setdiff(names(DT), k)) DT[, (d) := NULL ][]` – Frank Aug 01 '17 at 20:45
  • 1
    @Frank thanks, I feel validated. I'll write a wrapper function. I don't have the fortitude to attempt a syntax change and pull request – C8H10N4O2 Aug 01 '17 at 20:54

0 Answers0