2

I want to remove the lines from a data.table that only contain NAs.

> tab = data.table(A = c(1, NA, 3), B = c(NA, NA, 3))
> tab
    A  B
1:  1 NA
2: NA NA
3:  3  3

Normally I would do it with apply(dat, 1, ...) which unfortunately does not work on a data.table but it leads me to this inelegant solution:

> tab[apply(as.data.frame(tab), 1, function(x) !all(is.na(x))), ]
   A  B
1: 1 NA
2: 3  3

How can this be achieved the fastest way without knowing the column names?

jakob-r
  • 6,824
  • 3
  • 29
  • 47

3 Answers3

6

We can use Reduce with is.na and &

tab[!Reduce(`&`, lapply(tab, is.na))]
#   A  B
#1: 1 NA
#2: 3  3

Or a compact but not so efficient approach would be

tab[rowSums(!is.na(tab)) != 0L]

Also, as commented by @Frank, a join based approach,

tab[!tab[NA_integer_], on = names(tab)]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    Probably super inefficient, but there's also `df[!df[NA_integer_], on=names(df)]` -- anti-join on all columns against a row of NAs. – Frank Jul 22 '16 at 20:32
  • Ist there a way to make this code exclude the first column? I have a data.table where I want to remove rows which are completely empty apart from the first column. – Tom Jul 03 '19 at 13:52
  • @Tom Can you please post as a new question – akrun Jul 03 '19 at 13:53
  • https://stackoverflow.com/questions/56871943/remove-rows-which-have-all-nas-apart-from-the-first-column – Tom Jul 03 '19 at 14:06
2

Another idea:

library(dplyr)
df %>% 
  filter(rowSums(is.na(.)) < length(.))
Frank
  • 66,179
  • 8
  • 96
  • 180
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
0

I quite like

tab <- tab[sapply(1:nrow(tab), function(i){!all(is.na(tab[i,]))}),]

It is intuitive to me, but I'm not sure it is the fastest approach.

HTH

nncho
  • 165
  • 7