0

I have a tbl_df that looks like this:

Genes  Cell     AC    FC
   <chr>  <chr> <dbl> <dbl>
 1 abts-1 MSx1   94.9  6.81
 2 acp-2  Ea    301.  32.4 
 3 acp-2  Ep    188.  20.6 
 4 acs-13 MSx1   69.1  8.20
 5 acs-22 Ea    176.  19.4 
 6 acs-22 Ep     64.3  7.70
 7 acs-3  Ea    156.  17.2 
 8 acs-3  Ep     75.5  8.87
 9 add-2  Ea    123.   6.62
10 add-2  Ep    125.   6.69

I would like to remove all non-unique rows based on "Genes"/ not keep any of the rows. So it should look like:

Genes  Cell     AC    FC
   <chr>  <chr> <dbl> <dbl>
 1 abts-1 MSx1   94.9  6.81
 2 acs-13 MSx1   69.1  8.20

where none of the repeated genes are selected and the rest of the column data are maintained. I have tried unique(), distinct(), !duplicated etc - none of these remove all the non-unqiue rows.

1 Answers1

0

Try this:

library(dplyr)
#Code
new <- df %>%
  group_by(Genes) %>%
  filter(n()==1)

Output:

# A tibble: 2 x 4
# Groups:   Genes [2]
  Genes  Cell     AC    FC
  <chr>  <chr> <dbl> <dbl>
1 abts-1 MSx1   94.9  6.81
2 acs-13 MSx1   69.1  8.2 

Some data used:

#Data
df <- structure(list(Genes = c("abts-1", "acp-2", "acp-2", "acs-13", 
"acs-22", "acs-22", "acs-3", "acs-3", "add-2", "add-2"), Cell = c("MSx1", 
"Ea", "Ep", "MSx1", "Ea", "Ep", "Ea", "Ep", "Ea", "Ep"), AC = c(94.9, 
301, 188, 69.1, 176, 64.3, 156, 75.5, 123, 125), FC = c(6.81, 
32.4, 20.6, 8.2, 19.4, 7.7, 17.2, 8.87, 6.62, 6.69)), row.names = c(NA, 
-10L), class = "data.frame")
Duck
  • 39,058
  • 13
  • 42
  • 84