Keeping only rows which are duplicated once only

Question

I have a data set which looks like as follows:

A         B      C 
liver     5      RX
blood     9      DK 
liver     7      DK
intestine 5      RX
blood     3      DX
blood     1      DX
skin      2      RX
skin      2      DX

I want to keep only the duplicated (not triplicates or so on) entries based on A. Meaning if values in A are duplicate it should print the entire row.

The ideal output will look like:

A         B      C 
liver     5      RX
liver     7      DK
skin      2      RX
skin      2      DX

I tried using the following code with dplyr

df %>% group_by(A) %>% filter(n() >= 1)

Could someone please help me here?

tmfmnk · Accepted Answer · 2019-04-15T19:30:13.520

11

You can do:

df %>%
 group_by(A) %>%
 filter(n() == 2)

  A         B C    
  <chr> <int> <chr>
1 liver     5 RX   
2 liver     7 DK   
3 skin      2 RX   
4 skin      2 DX

Or a more verbose way to do the same:

df %>%
 add_count(A) %>%
 filter(n == 2) %>%
 select(-n)

Or:

df %>%
 group_by(A) %>%
 filter(max(row_number()) == 2)

Considering you may want duplicated cases based on "A" column that are otherwise unique:

df %>%
 group_by(A) %>%
 distinct() %>%
 filter(n() == 2)

edited Apr 15 '19 at 19:30

answered Apr 15 '19 at 18:45

tmfmnk

38,881
4
47
67

1

Perfect. Thank you so much for the quick response. – KVC_bioinfo Apr 15 '19 at 18:54
If this post helped, please accept it. – tmfmnk Apr 15 '19 at 19:51

Keeping only rows which are duplicated once only

1 Answers1