1

I have a data frame with some columns containing all NA and I want to get a vector of column indices that contain all NA. For example:

   A  B  C  D  E  F  G     
 1 4  5  3  NA 9  NA NA
 2 8  9  7  NA 9  9  NA
 3 1  1  6  NA 5  3  NA

Should give [4 7] as the 4th and 9th columns contain all NA.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
prre72
  • 697
  • 2
  • 12
  • 23

3 Answers3

0

You can use the nearZeroVar function from caret.

# set freqCut to 100/0, default is 95/5
caret::nearZeroVar(df1, freqCut = 100/0)
[1] 4 7

using which from 李哲源:

# option 1
which(colSums(sapply(df1, is.na)) == nrow(df1))
D G 
4 7 

# option 2
which(colSums(!is.na(df1)) == 0)
D G 
4 7 

benchmark:

microbenchmark::microbenchmark(caret = caret::nearZeroVar(df1, freqCut = 100/0),
                               which1 = which(colSums(sapply(df1, is.na)) == nrow(df1)),
                               which2 = which(colSums(!is.na(df1)) == 0))


Unit: microseconds
   expr      min        lq       mean   median        uq       max neval
  caret 1092.459 1109.8670 1266.86065 1130.494 1166.1870 13563.868   100
 which1   29.843   34.0850   39.03823   38.473   42.1310   110.885   100
 which2   21.358   24.5765   28.99438   29.111   32.7685    52.663   100

which option 2 is overall the fastest.

phiver
  • 23,048
  • 14
  • 44
  • 56
0

How about:

which( sapply( DF, function(x) all(is.na(x)) ) )

The is.na function returns TRUE or FALSE indicating if a value is missing. The all function then returns TRUE iff all its arguments are TRUE. The sapply function applies the function to each column in the data frame and returns a vector (logical in this case) and the which function turns the logical vector into the indices of the columns.

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
0

Here is an option using tidyverse

library(tidyverse)
df %>%
  map_lgl(~ all(is.na(.x))) %>% 
  which
#  D G 
#  4 7 
akrun
  • 874,273
  • 37
  • 540
  • 662