3

I have a matrix which its elements are 0, 1,2,NA!
I want to remove the columns which their colsums are equal to 0 or NA! I want to drop these columns from the original matrix and create a new matrix for these columns (nonzero colsums)! (I think for calculating colsums I have consider na.rm=True and remove the colums with colsum=0, because if I consider na.rm=False all the values of my colsums get NA)

this is my matrix format:

mat[1:6,1:6]

1:11059017  1:11088817  1:11090640   1:11099385   1:1109967  1:111144756

 0        0            0             0           NA          0
 0        0            0             0           0          NA
 1       NA            2             0           NA          0    
 0        0            0             1          0           2  
 2        0            0             0          0           0
 0        0            NA            0          0           0

 Summat <-  colSums(mat,na.rm = TRUE)

head(summat)
    
1:11059017  1:11088817  1:11090640   1:11099385   1:1109967  1:111144756 

[,1]   3           0             2          1           0            2

The 2nd and 5th columns have colsum=0 so I Ishould remove them from the met and keep the rest of columns in another matrix.

my output should be like below:

met-nonzero

 1:11059017      1:11090640     1:11099385     1:111144756

  0             0                  0                0
  0             0                  0                NA
  1             2                  0                0
  0             0                  1                2  
  2             0                  0                0
  0             NA                 0                0

would you please let me know how can I do that?

data:

structure(c(0L, 0L, 1L, 0L, 2L, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 
0L, 2L, 0L, 0L, NA, 0L, 0L, 0L, 1L, 0L, 0L, NA, 0L, NA, 0L, 0L, 
0L, 0L, NA, 0L, 2L, 0L, 0L), .Dim = c(6L, 6L), .Dimnames = list(
    NULL, c("X1.11059017", "X1.11088817", "X1.11090640", "X1.11099385", 
    "X1.1109967", "X1.111144756")))
SecretAgentMan
  • 2,856
  • 7
  • 21
  • 41
Ati
  • 55
  • 1
  • 7

1 Answers1

11

Work out which ones have colSums != 0:

i <- (colSums(mat, na.rm=T) != 0) # T if colSum is not 0, F otherwise

Then you can either select or drop them e.g.

matnonzero <- mat[, i] # all the non-zero columns
matzeros <- mat[, !i]  # all the zero columns

update to comment (are there ways to do it without the colSums). IMO, yes, there are, but colSums is one of the more elegant/efficient ways.

You could do something like:

apply(is.na(mat) | mat == 0, 2, all)

which will return TRUE for each column that is all-NA/0, so that

mat[, !apply(is.na(mat) | mat == 0, 2, all)]

will return all the non-zero columns.

However colSums is faster than apply.

system.time( replicate(1000, mat[, !apply(is.na(mat) | mat == 0, 2, all)]) )
#   user  system elapsed 
#  0.068   0.000   0.069 
system.time( replicate(1000, mat[, colSums(mat, na.rm=T) != 0]))
#   user  system elapsed 
#  0.012   0.000   0.013 

I'm sure there are many other ways to do it too.


update again as OP keeps adding to their question in the comments.. The new question is: remove all columns that:

  • have a 0 or a NA
  • the entire column has all of the same value in it.

The mechanics are unchanged - you just come up with a boolean (true or false) for each column deciding whether to keep it or not.

e.g.

Just like if all values in a column are is.na or ==0 you drop the column, with your second condition you could write (e.g.) length(unique({column})) == 1, or all(diff({column})) == 0, or many other equivalent ways.

So to combine them, remember that apply(X, 2, FUN) will apply the function FUN to every column of X.

So you could do:

i <- apply(mat,
      2,
      function (column) {
          any(is.na(col) | col == 0) |
          length(unique(col)) == 1
      })

which returns TRUE if the column has any NAs or 0s, or if the entire column has only one unique value. So this is TRUE if we should discard that column. Then you subset your matrix just as before, i.e.

mat[, !i]

If you wish to add further conditions different to the ones you have already asked for, think them through and give it a try yourself, and if you still can't, ask a new question rather than modifying this one again.

mathematical.coffee
  • 55,977
  • 11
  • 154
  • 194
  • I have anotheher question here! If I wanted to remove the columns which have just the NA or 0 or both values, is there anothor solution except the above solution! In the other hand, how can I delete the columns which all their elements are 0 or NA, without calculating the colSums?! – Ati Jul 15 '15 at 01:52
  • 1
    If you want to piggyback extra questions after your original one has already been answered, you should have asked it in the first place or ask another question. There are many other ways to do it (I will update the question with some). – mathematical.coffee Jul 15 '15 at 01:54
  • @Ati, I have updated my answer to take your comments into account. If the answer has answered your question, please accept it. – mathematical.coffee Jul 15 '15 at 03:19
  • Thank you! it works! I need the second solution because besides removing the columns with 0 and NA values, I have to delete the columns which contain the same values (contain just 0 or contain just 1 or contain just 2)! – Ati Jul 16 '15 at 13:53
  • I used your second solution for this reason and add more condition to that mat[, !apply(is.na(mat) | mat == 0 | mat==1 | mat==2, 2, all)] – Ati Jul 16 '15 at 13:55
  • would you please help me to delete these columns either? – Ati Jul 16 '15 at 14:01
  • It is very confusing that you keep changing your question after I answer it. – mathematical.coffee Jul 16 '15 at 23:22
  • Way too late here but Ati, the part you are adding to with ==0 | ==1 | ==2 is not doing what you think. This is within the selection of any values being 1 or 2. The part Mathematical.Coffee adds with "Unique(col)==1" will already select any column that has only one value (e..g all 1s or all 2s) – Kirk Geier Jul 07 '22 at 14:25