Finding duplicate IDs based on the values of a column

Question

I'm trying to find a way to do this, it seems like it should be simple enough but I'm struggling.

ID Color   
1 Blue  
2 Red  
2 Green  
2 Blue  
1 Green  
3 Red  
3 Blue

I'd like to keep only the duplicate rows which are both blue and green. So in my example only the ID 1.

edit : Sorry, should've been clearer, 2 isn't an output because it also hase the red value. I'm looking for duplicated rows with only blue and green values.

Is there a way to do this?

@zx8754 not exactly the right target. As explained, answer from the target would also give group 2 which OP doesn't want. — Ronak Shah, Dec 14 '18 at 09:22
@RonakShah it is not 100% dupe, but related, plus this post now has valid answers. Feel free to re-open, of course. — zx8754, Dec 14 '18 at 09:26

score 3 · Answer 1 · answered Dec 14 '18 at 09:09

3

Using base R ave, we select those ID that have only Color "Blue" OR "Green" in them.

df[with(df, ave(Color == "Blue" | Color == "Green", ID, FUN = all)), ]

#  ID Color
#1  1  Blue
#5  1 Green

answered Dec 14 '18 at 09:09

Ronak Shah

377,200
20
156
213

akrun · Accepted Answer · 2018-12-14T09:08:29.450

2

After grouping by 'ID', check whether all the 'Blue' and 'Green' values are %in% 'Color' column and only there are two distinct 'Color' categories to filter the rows

library(dplyr)
df1 %>%
   group_by(ID) %>%
   filter(all(c("Blue", "Green") %in% Color  & n_distinct(Color) == 2))
# A tibble: 2 x 2
# Groups:   ID [1]
#    ID Color
#  <int> <chr>
#1     1 Blue 
#2     1 Green

data

df1 <- structure(list(ID = c(1L, 2L, 2L, 2L, 1L, 3L, 3L), Color = c("Blue", 
"Red", "Green", "Blue", "Green", "Red", "Blue")),
   class = "data.frame", row.names = c(NA, 
-7L))

edited Dec 14 '18 at 09:08

answered Dec 14 '18 at 09:03

akrun

874,273
37
540
662

2

Seems like setequal should be an equivalent condition? – Frank Dec 14 '18 at 09:16
1

@Frank That works `setDT(df1)[, .SD[setequal(Color, c("Blue", "Green"))], ID]` – akrun Dec 14 '18 at 09:18

Finding duplicate IDs based on the values of a column

2 Answers2

data