How to drop all variables/columns without content from a data frame in R?

Question

Consider that my dataset is large and more complicated (more columns and rows).

This would be a simpler set as an example:

A <- rep(NA,10)
B <- rep(2,10)
C <- rep(NA,10)
D <- rep('B',10)
E <- c('NA',rep('XY',9))

dat <- data.frame(A,B,C,D,E)

    A B  C D  E
1  NA 2 NA B NA
2  NA 2 NA B XY
3  NA 2 NA B XY
4  NA 2 NA B XY
5  NA 2 NA B XY
6  NA 2 NA B XY
7  NA 2 NA B XY
8  NA 2 NA B XY
9  NA 2 NA B XY
10 NA 2 NA B XY

Variable A and Variabel B do not include any data. I would like to drop all variables from the data.frame that do include only NAs, so that the variables with content remain. dplyr solutions are welcome, but others as well.

I'd do `Filter(function(x) !all(is.na(x)), dat)`, but it seems there is already a topic like yours - [How to delete columns that contain ONLY NAs?](https://stackoverflow.com/questions/15968494/how-to-delete-columns-that-contain-only-nas) — arg0naut91, Oct 01 '20 at 14:58

score 1 · Answer 1 · answered Oct 01 '20 at 14:57

1

You can do it with dplyr:

library(dplyr)
dat %>%
    select_if(~ !any(is.na(.)))

answered Oct 01 '20 at 14:57

Elias

726
8
20

score 1 · Accepted Answer · answered Oct 01 '20 at 15:02

1

A base R option using colSums + is.na

> dat[colSums(is.na(dat))!=nrow(dat)]
   B D    E
1  2 B <NA>
2  2 B   XY
3  2 B   XY
4  2 B   XY
5  2 B   XY
6  2 B   XY
7  2 B   XY
8  2 B   XY
9  2 B   XY
10 2 B   XY

answered Oct 01 '20 at 15:02

ThomasIsCoding

96,636
9
24
81

score 0 · Answer 3 · answered Oct 01 '20 at 14:54

Try with an index for variables with NA:

#Index
i1 <- apply(dat,2,function(x)length(which(is.na(x))))
i2 <- which(i1==nrow(dat))
dat <- dat[,-i2]

Output:

   B D  E
1  2 B NA
2  2 B XY
3  2 B XY
4  2 B XY
5  2 B XY
6  2 B XY
7  2 B XY
8  2 B XY
9  2 B XY
10 2 B XY

How to drop all variables/columns without content from a data frame in R?

3 Answers3