0

I have loaded a table of this kind (but a lot bigger) in R (titles in caps):

https://postimg.org/image/66qmiayj5/

Does anyone know how I could make subsets of this table with:

  • all rows with the value 'yes' for all the columns
  • all rows with 'yes' in every column but one 'no'
  • all rows with 'yes' in every column but two 'no'
  • etc....
Hack-R
  • 22,422
  • 14
  • 75
  • 131
  • 1
    Reproducible data, like `dput` data or data that you create in a code block in the question are highly preferred to screenshots. – Hack-R Jul 08 '16 at 14:00
  • 2
    Welcome to StackOverflow. Please take a look at these tips on how to produce a [minimum, complete, and verifiable example](http://stackoverflow.com/help/mcve), as well as this post on [creating a great example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Perhaps the following tips on [asking a good question](http://stackoverflow.com/help/how-to-ask) may also be worth a read. – lmo Jul 08 '16 at 14:01

2 Answers2

0

You can use rowSums. With your example:

df <- data.frame('GENES'=1:3,'STRAIN_1'=c(TRUE,TRUE,FALSE),'STRAIN_2'=c(TRUE,TRUE,FALSE),'STRAIN_3'=c(TRUE,FALSE,TRUE))
> df
  GENES STRAIN_1 STRAIN_2 STRAIN_3
1     1     TRUE     TRUE     TRUE
2     2     TRUE     TRUE    FALSE
3     3    FALSE    FALSE     TRUE

You can subset the data set in the following way.

 > df[rowSums(df[,-1])==1,]
  GENES STRAIN_1 STRAIN_2 STRAIN_3
3     3    FALSE    FALSE     TRUE

> df[rowSums(df[,-1])==2,]
  GENES STRAIN_1 STRAIN_2 STRAIN_3
2     2     TRUE     TRUE    FALSE

Note that df[,-1] is used to exclude the first column and that rowSums(df[,-1]) retrieves the number of TRUE per rows.

Pierre Dudek
  • 252
  • 4
  • 11
0

Try using the filter command of dplyr. There are ways of doing this without creating a new column, but I wanted to make it obvious what's going on in this code.

# Create a sample data frame
df <- rbind(c("Yes", "Yes", "No"),
c("Yes", "Yes", "No"),
c("Yes", "Yes", "No"),
c("Yes", "Yes", "No"),
c("Yes", "No", "No"),
c("Yes", "Yes", "Yes"),
c("Yes", "Yes", "Yes"))
df <- data.frame(df)

# Create a column that counts the number of "No" values
df$count <- rowSums(df[-1] == "No")

# Filter for the appropriate number of "No"
library(dplyr)
df %>% filter(count == 0) # All "Yes"
df %>% filter(count == 1) # One "No"
df %>% filter(count == 2) # Two "No"
Andrew Brēza
  • 7,705
  • 3
  • 34
  • 40