Identify a list of (or remove) variables in a data frame that are completely empty (NAs)

Question

I have a very large data frame with numerous variables that are completely empty (NAs). My goal is to remove these variables. I want to exclude empty variables, not missing values. This seems like a very basic question but I cannot figure it out.

#sample data
A<-rbinom(100,1,1/2)
B<-rbinom(100,1,1/2)
C<-NA
D<-NA
df<-as.data.frame(cbind((1:100),A,B,C,D))
df<-as.data.frame(lapply(df, function(x) 
               "is.na<-"(x, sample(seq(x), floor(length(x) * runif(1, 0, .2))))))
Hmisc::describe(df)

I can make a list of these variables using Hmisc::describe(), but I can't figure out how to extract or use this list.

I didn't think that it would be that difficult to use the example here even if it refered to row rather than columns: http://stackoverflow.com/questions/25599139/identifying-rows-in-data-frame-with-only-na-values-in-r I got 36 hits to a search on `[r] identify columns all NA` — IRTFM, Dec 04 '15 at 18:49

score 2 · Accepted Answer · answered Dec 04 '15 at 18:30

2

Try this:

df[,!sapply(df,function(x) all(is.na(x)))]

or, to be extra safe:

df[,!sapply(df,function(x) all(is.na(x))),drop = FALSE]

answered Dec 04 '15 at 18:30

joran

169,992
32
429
468

brilliant! This works. I'll accept it as the answer as soon as the 12 minute cooling off period has passed. :) – micturalgia Dec 04 '15 at 18:31

score 1 · Answer 2 · answered Dec 04 '15 at 18:32

1

Try:

apply(df,2,function(x) sum(!is.na(x)))

All variables with only NA will have sum 0

answered Dec 04 '15 at 18:32

Bernardo

426
3
16

Identify a list of (or remove) variables in a data frame that are completely empty (NAs)

2 Answers2