0

The dataset is like this:

"1" 10 40 "r" "q" "0" "r" "r" "0" "r" "0" "0" "0" "0" "0" "t" "q" "0" "0" "s" "0" "r" 0 "0" 0 "0" "0" 0 0 0 "0"
"2" 10 173 "s" "s" "s" "0" "0" "s" "s" "0" "t" "t" "s" "t" "t" "r" "s" "0" "q" "0" "0" 0 "0" 0 "0" "0" 0 0 0 "0"
"3" 10 2107 "t" "0" "0" "s" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" 0 "0" 0 "0" "0" 0 0 0 "0"
"4" 10 993 "s" "0" "q" "s" "s" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" 0 "0" 0 "0" "0" 0 0 0 "0"
"5" 10 1712 "t" "0" "s" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "s" "0" "t" "0" 0 "0" 0 "0" "0" 0 0 0 "0"
"6" 776 1872 "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" 0 "r" 0 "0" "0" 0 0 0 "s"

Output should be:

"1" 10 40 "r" "q" "0" "r" "r" "0" "r" "0" "0" "0" "0" "0" "t" "q" "0" "0" "s" "0" "r" 0 "0" 0 "0" "0" 0 0 0 "0"
"2" 10 173 "s" "s" "s" "0" "0" "s" "s" "0" "t" "t" "s" "t" "t" "r" "s" "0" "q" "0" "0" 0 "0" 0 "0" "0" 0 0 0 "0"
"4" 10 993 "s" "0" "q" "s" "s" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" 0 "0" 0 "0" "0" 0 0 0 "0"
"5" 10 1712 "t" "0" "s" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "s" "0" "t" "0" 0 "0" 0 "0" "0" 0 0 0 "0"

The code that I have tried is:

x=read.table("sample.txt")
nrowx=nrow(x) 
for(i in 1:nrowx)
{
    count=0
    for(j in 3:30)
    {
        if(x[i,j]!=0)
        count = count+1
    }   
    if(count<4)
    x[i,]=NA    
}  
x=x[complete.cases(x),]

Please suggest some method that doesn't involve loop.

phoenix
  • 335
  • 1
  • 4
  • 19
  • 1
    Please supply a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Thomas Oct 23 '13 at 14:09
  • Please show us what you have tried. [Questions asking for code must include attempted solutions, why they didn't work, and the expected results](http://stackoverflow.com/help/on-topic) – Henrik Oct 23 '13 at 14:09
  • Are you just trying to remove the 2 rows beginning with 3 and 6? If so, look at `subset`. – TheComeOnMan Oct 23 '13 at 14:24
  • I have added the code that I have worked on. The code is giving correct result but is quite slow. Can you suggest some method that involves vectorization? – phoenix Oct 23 '13 at 14:28
  • 1
    Well, `"0"` is not equal to `0` . What is your data and what do you really want to test for? – Carl Witthoft Oct 23 '13 at 14:32

1 Answers1

1

It looks like none of your rows have less than four non-zero entries:

For example, printing the number of nonzero entries per row with tab being your table:

apply(tab, 1, function(x)sum(x!="0"))
 [1] 12 16  5  7  7  5

To for example eliminate all rows which have less than 5 nonzero entries, you could do

tab[-which(apply(tab, 1, function(x)sum(x!="0"))<=5),]

I am not sure if the first column in your data is treated as a column in your data frame, however.

Does this help?

user1981275
  • 13,002
  • 8
  • 72
  • 101
  • The 1st column is for serial no. and as per the code provided in question, comparison for nonzero entries is done from column 3. Anyways the alternative you provided is working fine. thanks! :) – phoenix Oct 23 '13 at 14:42
  • Ok, then `tab[-which(apply(tab[,3:30], 1, function(x)sum(x!="0"))<4),]` should give you what you want. – user1981275 Oct 23 '13 at 14:56