1

I have a matrix, I want to only keep those rows in which at least one member is more than 5. I mean those rows whose members all are less than 5 should be filtered out.

for example:

2 4 6 2 1
1 2 3 1 2
5 4 7 2 1

in this matrix, the second row should be filtered out because all of its members are less than 5.

here is what I write:

for(i in 1:length(matrix[,1]){
for(j in 2:17){
if(any(matrix[i,j]>=5)){matrix=matrix} else {matrix=matrix[-i,]}
}}

But it doesn't work.

what do you think I can do?

Pang
  • 9,564
  • 146
  • 81
  • 122
Fate
  • 43
  • 6

1 Answers1

0

Adapting some of the suggestions in this...

1) Identify which rows should be eliminated:

a<- read.table(text = "2 4 6 2 1
                       1 2 3 1 2
                       5 4 7 2 1")

a
     V1 V2 V3 V4 V5
[1,]  2  4  6  2  1
[2,]  1  2  3  1  2
[3,]  5  4  7  2  1

bye <- sapply(1:3, function(x){all(a[x,]<5)})
bye
[1] FALSE  TRUE FALSE

2) Use that to subset the matrix:

a2 <- a[!bye,]
a2
     V1 V2 V3 V4 V5
[1,]  2  4  6  2  1
[2,]  5  4  7  2  1
Community
  • 1
  • 1
paqmo
  • 3,649
  • 1
  • 11
  • 21
  • That sounds ok but when you have thousands of rows, this approach can be a bit difficult... What do you recommend in this case? – Fate Oct 14 '16 at 09:39
  • 1
    I tried it with a 5000x5000 matrix and it worked fine. A bit slow. Not sure what you mean by 'a bit difficult.' Maybe someone else has an idea for a more efficient solution. You could encapsulate the above in a function `filter <- function(m,n){ b <- sapply(1:length(m[1,]), function(x){all(m[x,] – paqmo Oct 14 '16 at 13:25
  • 1
    One more thought! You could use the `filter()` function from the `dyplr` package. The command looks like this: `a %>% filter(rowSums(. >=n) >0)`, where n is the number you want to filter based on. Just make sure that `a` is a data frame. – paqmo Oct 15 '16 at 21:00
  • Thanks a lot... Your codes work well... The problem is my first column is character and the rest are numbers... So when I try to apply your code to my matrix, it can't handle the first column and then the results are not accurate... I have to keep the first number because it's the name of each subject which is important in further analysis... Could you please help me know what I can do? – Fate Oct 16 '16 at 13:04
  • I have to keep the first column* – Fate Oct 16 '16 at 13:04
  • A few of things you can do. (1)Turn the first row intro column names using `rownames(a) <- a[,1]` and then subset a to exclude the first column `a <- a[,-1]`. (2) If you read the data using `read.csv` or `read.table`, using the option `row.names=1` turns the first column into row names. Or if, for some reason, you want to keep the characters in the first row, simply use this code `filter <- function(m,n){ b <- sapply(1:length(m[,1]), function(x){all(m[x,2:length(m[1,])] – paqmo Oct 16 '16 at 16:31