Find differences among groups based on a condition in dplyr

Question

I have a data frame that looks like this, but its gigantic.

df = data.frame(gene=c("A","B","F","A","D","E","B","C","D","G"),
                group=c("group1","group1","group1","group2","group2","group2","group3","group3","group3","group3"))
df

 gene    group
   A     group1
   B     group1
   F     group1
   A     group2
   D     group2
   E     group2
   B     group3
   C     group3
   D     group3
   G     group3

Based on the column gene, I want to find unique differences between groups containing the gene "A" and groups that do not include gene A.

I want my data to look this after the "filtering"

gene group
 F    group1
 E    group2

Since F is the only gene that is present in a group that contains the gene A and its not present in any other group.

thank you Martin. I am trying to find the genes that belong only to the groups with A and not in other groups. C and G are not present in the groups with A. does it make sense? — LDT, Oct 03 '21 at 20:47

akrun · Accepted Answer · 2021-10-03T20:46:25.613

2

We can filter the rows that have 'gene' containing 'A' and not having 'A' and then do an anti_join

library(dplyr)
tmp1 <- df %>% 
       filter(group %in% group[gene %in% 'A'])
 
tmp2 <- df %>% 
          group_by(group) %>% 
         filter(!'A' %in% gene) %>%
         ungroup
anti_join(tmp1, tmp2, by = 'gene') %>%
      filter(gene != 'A')

-output

 gene  group
1    F group1
2    E group2

edited Oct 03 '21 at 20:46

answered Oct 03 '21 at 20:45

akrun

874,273
37
540
662

Find differences among groups based on a condition in dplyr

1 Answers1

Linked