-1

I need to remove all variables with missing observations or all variables having all zeros as values for all observations in my data frame.

This ain't working data[,!sapply(data, function(x) any(is.na(x)))]

Sample for illustration purposes:

x y z a
. 3 0 1
. 4 0 2
2 3 0 3

So here I need to remove x variable because it has missing obs Next, I need to delete Z variable also since it contains all 0 obs for all.

Thank you.

agenis
  • 8,069
  • 5
  • 53
  • 102
icychamp
  • 70
  • 8
  • 1
    You can change your code to `df1[!sapply(df1, function(x) any(x==".")|all(x==0))]` as there are no `NA` elements based on the example showed – akrun Sep 21 '16 at 08:40
  • Similarly `data[(colSums(data == ".") == 0) & (colSums(data == 0) < nrow(data))]` – David Arenburg Sep 21 '16 at 09:07

1 Answers1

0

To remove the columns with zeros, I use this function a lot:

RemoveZeros = function(data, proportion=0.00){
  temp <- data
  NullRatio = function(x) {length(x[x==0])/length(x)}
  to_keep <- which(apply(temp, 2, NullRatio) <= proportion)
  message("removed columns: ", paste0(names(temp)[-to_keep], collapse=", "))
  return(temp[, to_keep])
}

You can define the maximum proportion of zeros allowed in the column for keeping it. It also prints the names of the suppressed columns.

agenis
  • 8,069
  • 5
  • 53
  • 102