0

Consider that my dataset is large and more complicated (more columns and rows).

This would be a simpler set as an example:

A <- rep(NA,10)
B <- rep(2,10)
C <- rep(NA,10)
D <- rep('B',10)
E <- c('NA',rep('XY',9))

dat <- data.frame(A,B,C,D,E)

    A B  C D  E
1  NA 2 NA B NA
2  NA 2 NA B XY
3  NA 2 NA B XY
4  NA 2 NA B XY
5  NA 2 NA B XY
6  NA 2 NA B XY
7  NA 2 NA B XY
8  NA 2 NA B XY
9  NA 2 NA B XY
10 NA 2 NA B XY

Variable A and Variabel B do not include any data. I would like to drop all variables from the data.frame that do include only NAs, so that the variables with content remain. dplyr solutions are welcome, but others as well.

SDahm
  • 474
  • 2
  • 9
  • 21
  • 1
    I'd do `Filter(function(x) !all(is.na(x)), dat)`, but it seems there is already a topic like yours - [How to delete columns that contain ONLY NAs?](https://stackoverflow.com/questions/15968494/how-to-delete-columns-that-contain-only-nas) – arg0naut91 Oct 01 '20 at 14:58

3 Answers3

1

You can do it with dplyr:

library(dplyr)
dat %>%
    select_if(~ !any(is.na(.)))
Elias
  • 726
  • 8
  • 20
1

A base R option using colSums + is.na

> dat[colSums(is.na(dat))!=nrow(dat)]
   B D    E
1  2 B <NA>
2  2 B   XY
3  2 B   XY
4  2 B   XY
5  2 B   XY
6  2 B   XY
7  2 B   XY
8  2 B   XY
9  2 B   XY
10 2 B   XY
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

Try with an index for variables with NA:

#Index
i1 <- apply(dat,2,function(x)length(which(is.na(x))))
i2 <- which(i1==nrow(dat))
dat <- dat[,-i2]

Output:

   B D  E
1  2 B NA
2  2 B XY
3  2 B XY
4  2 B XY
5  2 B XY
6  2 B XY
7  2 B XY
8  2 B XY
9  2 B XY
10 2 B XY
Duck
  • 39,058
  • 13
  • 42
  • 84