0

I have a dataframe df like so

food    color   popular
apple   red     no
pear    green   no
banana  yellow  yes
apple   red     yes

How do i get nonduplicate rows based on only two columns (food & color)?

Expected result:

food    color   popular
pear    green   no
banana  yellow  yes

I tried:

df %>% distinct(food, color, .keep_all = TRUE)

but this doesn't give me my expected result

1 Answers1

1
library(dplyr)

# Create test data
df = tibble(
    food=c("apple", "pear", "bananna", "apple"),
    color=c("red", "green", "yellow", "red"),
    popular=c(F, F, T, T)
)

df %>%
    # Make a group for each combination of food and colour
    group_by(food, color) %>%
    # Then delete any group with more than 1 element
    # (since they are duplicates)
    filter(n() == 1) %>%
    ungroup()
# A tibble: 2 × 3
  food    color  popular
  <chr>   <chr>  <lgl>  
1 pear    green  FALSE  
2 bananna yellow TRUE 
Migwell
  • 18,631
  • 21
  • 91
  • 160