5

I have a sample dataset which looks something similar to the one below:

d= data.frame(a = c(1,5,56,4,9), 
              b = c(0,0,NA,0,NA), 
              c = c(98,67,NA,3,7), 
              d = c(0,0,0,0,0), 
              e = c(NA,NA,NA,NA,NA))

which would be:


| a  |  b |  c | d | e  |
|----|:--:|---:|---|----|
| 1  |  0 | 98 | 0 | NA |
| 5  |  0 | 67 | 0 | NA |
| 56 | NA | NA | 0 | NA |
| 4  | 0  | 3  | 0 | NA |
| 9  | NA | 7  | 0 | NA |

I need to remove all such columns which have:

1. NA's and Zeros
2. Only Zeros
3. Only NA's

So based on the above dataset, columns b,d and e should be eliminated. So, I first need to find out which columns have such conditions and then delete them.

I went through this link Remove the columns with the colsums=0 but I'm not clear with the solution. Also, it doesn't provide me the desired output.

The final output would be:

| a  |  c |
|----|:--:|
| 1  | 98 |
| 5  | 67 |
| 56 | NA |
| 4  | 3  |
| 9  | 7  |
M--
  • 25,431
  • 8
  • 61
  • 93
hk2
  • 487
  • 3
  • 15

4 Answers4

4

One option would be to create a logical vector with colSums based on the number of NA or 0 elements in each column

d[!colSums(is.na(d)|d ==0) == nrow(d)]
#  a    c
#1  1   98
#2  5   67
#3 56   NA
#4  4    3
#5  9    7

Or another option is to replace all the 0s to NA and then apply is.na

d[colSums(!is.na(replace(d, d == 0, NA))) > 0]

Or more compactly with na_if

d[colSums(!is.na(na_if(d, 0))) > 0]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Well, I have tried this on my real dataset. I don't know why, but it does not remove the columns which have both NA's and zeros. Even in the above sample dataset, it does not delete the column e. The other approach I could think of was to replace all the NA's with zeros and then remove the columns with all zeros, but that too doesn't seem to work. – hk2 Sep 24 '19 at 20:32
  • @hk2 I get the expected output based on your example. May be you used the earlier version – akrun Sep 24 '19 at 20:34
2

In base and assuming that we have different type of columns:

as.data.frame(Filter(function(x) !all(x=="NA" | x == "0"), {lapply(d, as.character)}))

Using dplyr:

library(dplyr)

d %>% 
  mutate_all(as.character) %>% 
  select(which(colSums(abs(.), na.rm = T) != 0))

Output:

#>    a  c
#> 1  1 98
#> 2  5 67
#> 3 56 NA
#> 4  4  3
#> 5  9  7
M--
  • 25,431
  • 8
  • 61
  • 93
  • This would be true if all the columns are numeric. However, if some of those columns are of different data types such as character and date, how will the above code be modified? – hk2 Sep 24 '19 at 20:46
  • @hk2 please edit your dataset and make it representative of your actual dataset https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – M-- Sep 24 '19 at 20:50
2

We can use apply column-wise and remove columns which has all, NA or 0's.

d[!apply(d == 0 | is.na(d), 2, all)]

#   a  c
#1  1 98
#2  5 67
#3 56 NA
#4  4  3
#5  9  7
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

Very strange to store NAs and 0 as strings, but there you go...

bad_column <- function(z) {
  all(z %in% c("NA", "0"))
}

d[, !sapply(d, bad_column), drop = FALSE]
Michael M
  • 880
  • 7
  • 10