3

I need to find the unique entries in my dataframe using column ID and Genus. I do not need to find unique values from column Count. My dataframe is structured like this:

ID    Genus    Count
A     Genus1   4
A     Genus18  265
A     Genus28  1
A     Genus2   900
B     Genus1   85
B     Genus18  9
B     Genus28  24
B     Genus2   6
B     Genus3000 152

The resulting dataframe would have only

ID     Genus    Count
B      Genus3000  152

In it because this row is unique by ID and Genus.

I have tidyverse loaded but have had trouble trying to get the result I need. I tried using distinct() but continue to get back all data from the input as output.

I have tried the following:

uniquedata <- mydata %>% distinct(.keep_all = TRUE)
uniquedata <- mydata %>% group_by(ID, Genus) %>% distinct(.keep_all = TRUE)
uniquedata <- mydata %>% distinct(ID, Genus, .keep_all = TRUE)
uniquedata <- mydata %>% distinct()

What should I use to achieve my desired output?

aminards
  • 309
  • 2
  • 11

2 Answers2

3

We could use add_count in combination with filter:

library(dplyr)

df %>% 
    add_count(Genus) %>% 
    filter(n == 1) %>% 
    select(ID, Genus, Count)

Output:

  ID    Genus     Count
  <chr> <chr>     <dbl>
1 B     Genus3000   152
TarJae
  • 72,363
  • 6
  • 19
  • 66
1

For the given data set, it is enough to check the column "Genus" for values appearing twice and then to remove the corresponding rows from the dataframe.

df %>% count(Genus) -> countGenus
filter(df, Genus %in% filter(countGenus,n==1)$Genus)
saz
  • 226
  • 1
  • 3