-4

Given very big matrix, I need to remove the rows of that matrix which 90 % of it's entries are less than 20. Would someone help to implement this in R ?

Marc in the box
  • 11,769
  • 4
  • 47
  • 97
user2806363
  • 2,513
  • 8
  • 29
  • 48
  • 2
    I think people here are expecting some [reproducible code](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), in order to efficiently help you. –  Aug 06 '14 at 06:37
  • 2
    And people would prefer that you clearly define your problem the first time around.... – A5C1D2H2I1M1N2O1R2T1 Aug 06 '14 at 07:00

1 Answers1

4

This might help you:

m <- matrix(1:20, nrow = 4)
m
#      [,1] [,2] [,3] [,4] [,5]
#[1,]    1    5    9   13   17
#[2,]    2    6   10   14   18
#[3,]    3    7   11   15   19
#[4,]    4    8   12   16   20

Now remove row all rows where 90 % of the row's entries are less than 2:

m[rowSums(m > 2) >= 0.9*ncol(m),]  
#     [,1] [,2] [,3] [,4] [,5]
#[1,]    3    7   11   15   19
#[2,]    4    8   12   16   20

Some explanation: rowSums(m > 2) counts how many entries in each row are greater than 2 for each row. 0.9*ncol(m) is the threshold of 90 of the columns and these two are compared for each row and if it is TRUE, the row is selected, if it's FALSE, the row is dropped/removed.

talat
  • 68,970
  • 21
  • 126
  • 157
  • +1 - I prefer a slight rejig to `m[rowSums(m >2)/ncol(m) >= 0.9,]` , but you say /təˈmeɪtoʊz/ and I say /təˈmɑːtoʊz/” so... – thelatemail Aug 06 '14 at 06:50
  • Thanks @thelatemail, I also like your suggestion :) – talat Aug 06 '14 at 06:51
  • @beginneR, thanks, but this is not right answer. what I mean with 90% of enteris is that, if I have 100 columns, I want to remove rows which 90 enteries of it are less than 2. – user2806363 Aug 06 '14 at 06:55
  • 2
    You can easily adapt my code to your specific needs which were not well specified in your question. See what happens when you chAnge the >= to < or when you replace 0.9 with 0.1 and so on. – talat Aug 06 '14 at 07:05
  • Should'nt rowSums add up all numbers>2 in that row rather than counting them? How can one get sum of all numbers>2 in a row? – rnso Aug 06 '14 at 09:18
  • @rnso, `m > 2` should create a logical vector which is then summed up by row. I think what you mean is `rowSums(m[m > 2])`, correct? – talat Aug 06 '14 at 09:21
  • 1
    @beginneR: OK. So all TRUEs are counted or summed. But rowSums(m[m > 2]) does not work: Error in rowSums(m[m > 2]) : 'x' must be an array of at least two dimensions – rnso Aug 06 '14 at 09:41
  • @rnso, right, in that case, you'd have to use `apply(m, 1, function(x) sum(x[x>2]))`. But that's not what was asked in the question, if I undestood correctly – talat Aug 06 '14 at 10:21