1

I'm trying to find a way to do this, it seems like it should be simple enough but I'm struggling.

ID Color   
1 Blue  
2 Red  
2 Green  
2 Blue  
1 Green  
3 Red  
3 Blue

I'd like to keep only the duplicate rows which are both blue and green. So in my example only the ID 1.

edit : Sorry, should've been clearer, 2 isn't an output because it also hase the red value. I'm looking for duplicated rows with only blue and green values.

Is there a way to do this?

Sotos
  • 51,121
  • 6
  • 32
  • 66
A.Ch
  • 45
  • 3
  • 1
    @zx8754 not exactly the right target. As explained, answer from the target would also give group 2 which OP doesn't want. – Ronak Shah Dec 14 '18 at 09:22
  • 1
    @RonakShah it is not 100% dupe, but related, plus this post now has valid answers. Feel free to re-open, of course. – zx8754 Dec 14 '18 at 09:26

2 Answers2

3

Using base R ave, we select those ID that have only Color "Blue" OR "Green" in them.

df[with(df, ave(Color == "Blue" | Color == "Green", ID, FUN = all)), ]

#  ID Color
#1  1  Blue
#5  1 Green
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

After grouping by 'ID', check whether all the 'Blue' and 'Green' values are %in% 'Color' column and only there are two distinct 'Color' categories to filter the rows

library(dplyr)
df1 %>%
   group_by(ID) %>%
   filter(all(c("Blue", "Green") %in% Color  & n_distinct(Color) == 2))
# A tibble: 2 x 2
# Groups:   ID [1]
#    ID Color
#  <int> <chr>
#1     1 Blue 
#2     1 Green

data

df1 <- structure(list(ID = c(1L, 2L, 2L, 2L, 1L, 3L, 3L), Color = c("Blue", 
"Red", "Green", "Blue", "Green", "Red", "Blue")),
   class = "data.frame", row.names = c(NA, 
-7L))
akrun
  • 874,273
  • 37
  • 540
  • 662