0

I have a data frame like that:

Bacteria                  feature_id s_counts   s1   s2      s3   s4    s5   s6
s__Bacillus_thuringiensis c34ed8     4745       0    1300    12    0    190  230   
s__Bacillus_pumilus       d73583     333        333  0       0     0    0    0

I would like to filter and retain only rows whose counts are >0 in at least 4 columns from column s1 to ncol, so I could get it:

Bacteria                  feature_id s_counts   s1   s2      s3   s4    s5   s6
s__Bacillus_thuringiensis c34ed8     4745       0    1300    12    0    190  230

This is similar to this question Subset data frame based on number of rows per group unless that I am interested in the columns instead of rows, so I couldn't figure it out how to "adapt the solution" to cols. I can't see a way of grouping the data as I need the columns instead of the rows. Could someone help me on that? Also, I would appreciate a solution with dplyr, if possible.

Cheers,

Leo

Leonardo
  • 85
  • 7
  • 3
    You need `rowSums`. Something like `df[rowSums(df[-c(1:2)] > 0) > 4,]` – Sotos Jan 08 '20 at 14:47
  • That is throwing an error "Error in `[.default`(df, -1:2) : only 0's may be mixed with negative subscripts", but someone did post a working answer and then deleted – Leonardo Jan 08 '20 at 14:56
  • Try it again...I edited and It shouldn't throw it anymore. Also check out the target I duped it with – Sotos Jan 08 '20 at 14:56
  • `>df Bacteria feature_id s_counts s1 s2 s3 s4 s5 s6 1 s__Bacillus_thuringiensis c34ed8 4745 0 1300 12 0 190 230 2 s__Bacillus_pumilus d73583 333 333 0 0 0 0 0 > df[rowSums(df[-1:2] > 0) <4,] Error in `[.default`(df, -1:2) : only 0's may be mixed with negative subscripts` – Leonardo Jan 08 '20 at 14:58
  • Check my comment again. Re-copy it, but most importantly, check out the dupe target – Sotos Jan 08 '20 at 14:58
  • 1
    Thank you. Now it is working! I am going to check the other post you mentioned. – Leonardo Jan 08 '20 at 14:59

0 Answers0