R studio .csv table subset

Question

I have loaded a table of this kind (but a lot bigger) in R (titles in caps):

Does anyone know how I could make subsets of this table with:

all rows with the value 'yes' for all the columns
all rows with 'yes' in every column but one 'no'
all rows with 'yes' in every column but two 'no'
etc....

Reproducible data, like `dput` data or data that you create in a code block in the question are highly preferred to screenshots. — Hack-R, Jul 08 '16 at 14:00
Welcome to StackOverflow. Please take a look at these tips on how to produce a [minimum, complete, and verifiable example](http://stackoverflow.com/help/mcve), as well as this post on [creating a great example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Perhaps the following tips on [asking a good question](http://stackoverflow.com/help/how-to-ask) may also be worth a read. — lmo, Jul 08 '16 at 14:01

score 0 · Accepted Answer · answered Jul 08 '16 at 15:25

You can use rowSums. With your example:

df <- data.frame('GENES'=1:3,'STRAIN_1'=c(TRUE,TRUE,FALSE),'STRAIN_2'=c(TRUE,TRUE,FALSE),'STRAIN_3'=c(TRUE,FALSE,TRUE))
> df
  GENES STRAIN_1 STRAIN_2 STRAIN_3
1     1     TRUE     TRUE     TRUE
2     2     TRUE     TRUE    FALSE
3     3    FALSE    FALSE     TRUE

You can subset the data set in the following way.

 > df[rowSums(df[,-1])==1,]
  GENES STRAIN_1 STRAIN_2 STRAIN_3
3     3    FALSE    FALSE     TRUE

> df[rowSums(df[,-1])==2,]
  GENES STRAIN_1 STRAIN_2 STRAIN_3
2     2     TRUE     TRUE    FALSE

Note that df[,-1] is used to exclude the first column and that rowSums(df[,-1]) retrieves the number of TRUE per rows.

score 0 · Answer 2 · answered Jul 08 '16 at 15:31

Try using the filter command of dplyr. There are ways of doing this without creating a new column, but I wanted to make it obvious what's going on in this code.

# Create a sample data frame
df <- rbind(c("Yes", "Yes", "No"),
c("Yes", "Yes", "No"),
c("Yes", "Yes", "No"),
c("Yes", "Yes", "No"),
c("Yes", "No", "No"),
c("Yes", "Yes", "Yes"),
c("Yes", "Yes", "Yes"))
df <- data.frame(df)

# Create a column that counts the number of "No" values
df$count <- rowSums(df[-1] == "No")

# Filter for the appropriate number of "No"
library(dplyr)
df %>% filter(count == 0) # All "Yes"
df %>% filter(count == 1) # One "No"
df %>% filter(count == 2) # Two "No"

R studio .csv table subset

2 Answers2