Remove Column if all values are either NA or 0 in r

Question

Does anyone know how one would go about removing columns that consist of entirely either NAs or zeros?

For example, given a dataframe like this:

  rv    X1   X2    X3  X4 
1 M     0    110   0   1
2 J     70   200   0   3
3 J     NA   115   NA  4
4 M     65   110   0   9
5 J     70   200   NA  3
6 J     64   115   NA  8

I would want just rv, X1, X2 and X4 columns.

Hint: 1. Run a test on the columns, to see if ‘all’ entries are 0 or NA, 2. Remove these columns. Answers to both of these steps can be found on Stack. — Hector Haffenden, May 11 '19 at 13:44

akrun · Answer 1 · 2019-05-11T14:12:59.580

7

With base R, we can do

Filter(function(x) !all(is.na(x)|x == 0), df)
#  rv X1  X2 X4
#1  M  0 110  1
#2  J 70 200  3
#3  J NA 115  4
#4  M 65 110  9
#5  J 70 200  3
#6  J 64 115  8

Or using colSums

df[!!colSums(replace(df, is.na(df), 0)!= 0)]

Or using sapply to create a logical vector

df[!sapply(df, function(x) all(is.na(x)|x == 0))]

edited May 11 '19 at 14:12

answered May 11 '19 at 13:48

akrun

874,273
37
540
662

Ronak Shah · Accepted Answer · 2019-05-11T14:08:27.527

5

Multiple ways to do this

df[colSums(is.na(df) | df == 0) != nrow(df)]

#  rv X1  X2 X4
#1  M  0 110  1
#2  J 70 200  3
#3  J NA 115  4
#4  M 65 110  9
#5  J 70 200  3
#6  J 64 115  8

Using apply

df[!apply(is.na(df) | df == 0, 2, all)]

Or using dplyr

library(dplyr)
df %>%  select_if(~!all(is.na(.) | . == 0))

edited May 11 '19 at 14:08

answered May 11 '19 at 13:45

Ronak Shah

377,200
20
156
213

Rémi Coulaud · Answer 3 · 2019-05-11T13:48:56.283

0

A way to test one column is :

(sum(is.na(df$X1)) + sum(df$X1 == 0))  == nrow(df)

X1 is the column name.

edited May 11 '19 at 13:48

answered May 11 '19 at 13:45

Rémi Coulaud

1,684
1
8
19

Remove Column if all values are either NA or 0 in r

3 Answers3