1

How to select columns that don't contain any NA values in R? As long as a column contains at least one NA, I want to exclude it. What's the best way to do it? I am trying to use sum(is.na(x)) to achieve this, but haven't been successful.

Also, another R question. Is it possible to use commands to exclude columns that contain all same values? For example,

  column1  column2
row1   a        b  
row2   a        c
row3   a        c

My purpose is to exclude column1 from my matrix so the final result is:

   column2
row1   b  
row2   c
row3   c
mustaccio
  • 18,234
  • 16
  • 48
  • 57
user697911
  • 10,043
  • 25
  • 95
  • 169
  • See http://stackoverflow.com/questions/2643939/remove-columns-from-dataframe-where-all-values-are-na/12614723#12614723 and modify appropriately. – mnel Jun 13 '14 at 04:44

3 Answers3

4

Remove columns from dataframe where ALL values are NA deals with the case where ALL values are NA

For a matrix, you can use colSums(is.na(x) to find out which columns contain NA values

given a matrix x

x[, !colSums(is.na(x)), drop = FALSE]

will subset appropriately.

For a data.frame, it will be more efficient to use lapply or sapply and the function anyNA

xdf[, sapply(xdf, Negate(anyNA)), drop = FALSE]
Community
  • 1
  • 1
mnel
  • 113,303
  • 27
  • 265
  • 254
1

Also, could do

new.df <- df[, colSums(is.na(df)) == 0 ]

this way lets you subset based on the number of NA values in the columns.

USER_1
  • 2,409
  • 1
  • 28
  • 28
  • Could you please [edit] in an explanation of why this code answers the question? Code-only answers are [discouraged](http://meta.stackexchange.com/q/148272/274165), because they don't teach the solution. (This post was flagged by at least one user, presumably because they thought an answer without explanation should be deleted.) – Nathan Tuggy Jun 16 '15 at 02:13
0

Also if 'mat1' is the matrix:

indx <- unique(which(is.na(mat1), arr.ind=TRUE)[,2])
subset(mat1, select=-indx)
akrun
  • 874,273
  • 37
  • 540
  • 662