Remove columns with NA's and/or Zeros Only

Question

I have a sample dataset which looks something similar to the one below:

d= data.frame(a = c(1,5,56,4,9), 
              b = c(0,0,NA,0,NA), 
              c = c(98,67,NA,3,7), 
              d = c(0,0,0,0,0), 
              e = c(NA,NA,NA,NA,NA))

which would be:


| a  |  b |  c | d | e  |
|----|:--:|---:|---|----|
| 1  |  0 | 98 | 0 | NA |
| 5  |  0 | 67 | 0 | NA |
| 56 | NA | NA | 0 | NA |
| 4  | 0  | 3  | 0 | NA |
| 9  | NA | 7  | 0 | NA |

I need to remove all such columns which have:

1. NA's and Zeros
2. Only Zeros
3. Only NA's

So based on the above dataset, columns b,d and e should be eliminated. So, I first need to find out which columns have such conditions and then delete them.

I went through this link Remove the columns with the colsums=0 but I'm not clear with the solution. Also, it doesn't provide me the desired output.

The final output would be:

| a  |  c |
|----|:--:|
| 1  | 98 |
| 5  | 67 |
| 56 | NA |
| 4  | 3  |
| 9  | 7  |

Thank you. I tested your data and it seems to be working fine for me to return the expected output — akrun, Sep 24 '19 at 20:44

akrun · Accepted Answer · 2019-09-24T20:44:00.177

4

One option would be to create a logical vector with colSums based on the number of NA or 0 elements in each column

d[!colSums(is.na(d)|d ==0) == nrow(d)]
#  a    c
#1  1   98
#2  5   67
#3 56   NA
#4  4    3
#5  9    7

Or another option is to replace all the 0s to NA and then apply is.na

d[colSums(!is.na(replace(d, d == 0, NA))) > 0]

Or more compactly with na_if

d[colSums(!is.na(na_if(d, 0))) > 0]

edited Sep 24 '19 at 20:44

answered Sep 24 '19 at 20:26

akrun

874,273
37
540
662

Well, I have tried this on my real dataset. I don't know why, but it does not remove the columns which have both NA's and zeros. Even in the above sample dataset, it does not delete the column e. The other approach I could think of was to replace all the NA's with zeros and then remove the columns with all zeros, but that too doesn't seem to work. – hk2 Sep 24 '19 at 20:32
@hk2 I get the expected output based on your example. May be you used the earlier version – akrun Sep 24 '19 at 20:34

M-- · Answer 2 · 2019-09-24T21:05:42.350

2

In base and assuming that we have different type of columns:

as.data.frame(Filter(function(x) !all(x=="NA" | x == "0"), {lapply(d, as.character)}))

Using dplyr:

library(dplyr)

d %>% 
  mutate_all(as.character) %>% 
  select(which(colSums(abs(.), na.rm = T) != 0))

Output:

#>    a  c
#> 1  1 98
#> 2  5 67
#> 3 56 NA
#> 4  4  3
#> 5  9  7

edited Sep 24 '19 at 21:05

answered Sep 24 '19 at 20:27

M--

25,431
8
61
93

This would be true if all the columns are numeric. However, if some of those columns are of different data types such as character and date, how will the above code be modified? – hk2 Sep 24 '19 at 20:46
@hk2 please edit your dataset and make it representative of your actual dataset https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – M-- Sep 24 '19 at 20:50

score 2 · Answer 3 · answered Sep 25 '19 at 01:54

2

We can use apply column-wise and remove columns which has all, NA or 0's.

d[!apply(d == 0 | is.na(d), 2, all)]

#   a  c
#1  1 98
#2  5 67
#3 56 NA
#4  4  3
#5  9  7

answered Sep 25 '19 at 01:54

Ronak Shah

377,200
20
156
213

score 1 · Answer 4 · answered Sep 24 '19 at 20:28

1

Very strange to store NAs and 0 as strings, but there you go...

bad_column <- function(z) {
  all(z %in% c("NA", "0"))
}

d[, !sapply(d, bad_column), drop = FALSE]

answered Sep 24 '19 at 20:28

Michael M

880
7
10

The post has been edited. I have changed them to be numeric. – hk2 Sep 24 '19 at 20:43

Remove columns with NA's and/or Zeros Only

4 Answers4