How can I find the unique combinations based on two columns?

Question

I need to find the unique entries in my dataframe using column ID and Genus. I do not need to find unique values from column Count. My dataframe is structured like this:

ID    Genus    Count
A     Genus1   4
A     Genus18  265
A     Genus28  1
A     Genus2   900
B     Genus1   85
B     Genus18  9
B     Genus28  24
B     Genus2   6
B     Genus3000 152

The resulting dataframe would have only

ID     Genus    Count
B      Genus3000  152

In it because this row is unique by ID and Genus.

I have tidyverse loaded but have had trouble trying to get the result I need. I tried using distinct() but continue to get back all data from the input as output.

I have tried the following:

uniquedata <- mydata %>% distinct(.keep_all = TRUE)
uniquedata <- mydata %>% group_by(ID, Genus) %>% distinct(.keep_all = TRUE)
uniquedata <- mydata %>% distinct(ID, Genus, .keep_all = TRUE)
uniquedata <- mydata %>% distinct()

What should I use to achieve my desired output?

score 3 · Answer 1 · answered Jul 27 '21 at 17:32

3

We could use add_count in combination with filter:

library(dplyr)

df %>% 
    add_count(Genus) %>% 
    filter(n == 1) %>% 
    select(ID, Genus, Count)

Output:

  ID    Genus     Count
  <chr> <chr>     <dbl>
1 B     Genus3000   152

answered Jul 27 '21 at 17:32

TarJae

72,363
6
19
66

score 1 · Accepted Answer · answered Jul 27 '21 at 17:13

1

For the given data set, it is enough to check the column "Genus" for values appearing twice and then to remove the corresponding rows from the dataframe.

df %>% count(Genus) -> countGenus
filter(df, Genus %in% filter(countGenus,n==1)$Genus)

answered Jul 27 '21 at 17:13

saz

226
1
3

That made sense but unfortunately every thing gets the value 1, thus nothing is removed. – aminards Jul 27 '21 at 17:21
@aminards What do you mean by "every thing gets the value 1"? I tried the code and got the correct result – saz Jul 27 '21 at 17:27
I suppose I have to accept that I have no unique values! – aminards Jul 27 '21 at 21:07

How can I find the unique combinations based on two columns?

2 Answers2