2

I have a data.table with some columns contain NAs only. I want to remove these columns.

I tried this, but doesn't seem to work for data.table class.

edit:

library('data.table')
dat = data.table(a = rep(NA, 10), b = 1:10)
dat
     a  b
 1: NA  1
 2: NA  2
 3: NA  3
 4: NA  4
 5: NA  5
 6: NA  6
 7: NA  7
 8: NA  8
 9: NA  9
10: NA 10
dat[, ( colSums(is.na(dat)) != nrow(dat) ) := NULL]

Error in [.data.table(dat, , :=((colSums(is.na(dat)) != nrow(dat)), : LHS of := isn't column names ('character') or positions ('integer' or 'numeric')

Community
  • 1
  • 1
Roy C
  • 197
  • 2
  • 12
  • Combining the syntax of my link with yours, you get `DT[, ( colSums(is.na(DT)) != nrow(DT) ) := NULL]` Read the package vignettes for more on data.table's syntax differences from vanilla data.frames. https://github.com/Rdatatable/data.table/wiki/Getting-started With regard to adding/removing/modifying columns, it's not just a syntax difference, it's also more efficient. – Frank May 10 '16 at 19:32
  • @rawr Nice find :) You could comment on his answer instead. He'll probably fix it and was used to a different idiom for setting seeds (matlab?) – Frank May 10 '16 at 19:34
  • @Frank I tried this but got an error, please see my edit above. – Roy C May 10 '16 at 19:39
  • 3
    Ah, my bad. Following the advice of the error message, you can wrap in `which` to get column numbers `dat[, which( colSums(is.na(dat)) != nrow(dat) ) := NULL]` – Frank May 10 '16 at 19:45
  • 3
    or `dat[, (colSums(is.na(dat)) != nrow(dat)), with = FALSE]` @Frank ? – rawr May 10 '16 at 19:47
  • I wouldn't say this question is a duplicate. This one seems like a single-use case, while the other is for a more flexible function. Since the OP gets an error, I might suggest using the base R subset() function to exclude the known columns with full NA's here -- like subset(DT, select=-nas) – GlennFriesen May 10 '16 at 19:47
  • 2
    @Frank Now it works. But in the opposite way. I guess it should be `dat[, which( colSums(is.na(dat)) == nrow(dat) ) := NULL]`. Thanks a lot! – Roy C May 10 '16 at 19:48
  • 1
    @Glenn Yeah, I can see your point. If someone else wants to un-dupe it, that's fine by me. By the way, one of the main points of the data.table package is to modify data by reference. `subset` isn't consistent with that philosophy, since it creates a new object instead of "deleting" the columns in the original object (per the OP's title). – Frank May 10 '16 at 19:55
  • For a quick google search for this specific question (how I got here) this is not a dupe. I'd have to read quite a bit into the other question to understand the NA case. Could someone with rights un-dupe it? Also, @Frank why don't you make your comment an answer? Thats where I look for a quick solution. – Jakob Jun 06 '21 at 08:17

0 Answers0