Filtering dataframe based on two columns

Question

I have a dataframe df like so

food    color   popular
apple   red     no
pear    green   no
banana  yellow  yes
apple   red     yes

How do i get nonduplicate rows based on only two columns (food & color)?

Expected result:

food    color   popular
pear    green   no
banana  yellow  yes

I tried:

df %>% distinct(food, color, .keep_all = TRUE)

but this doesn't give me my expected result

We could use `duplicated` i.e. `df[!duplicated(df[ ,c(1, 2)]), ]` — dario, Nov 03 '21 at 12:34

score 1 · Answer 1 · answered Nov 03 '21 at 12:40

library(dplyr)

# Create test data
df = tibble(
    food=c("apple", "pear", "bananna", "apple"),
    color=c("red", "green", "yellow", "red"),
    popular=c(F, F, T, T)
)

df %>%
    # Make a group for each combination of food and colour
    group_by(food, color) %>%
    # Then delete any group with more than 1 element
    # (since they are duplicates)
    filter(n() == 1) %>%
    ungroup()

# A tibble: 2 × 3
  food    color  popular
  <chr>   <chr>  <lgl>  
1 pear    green  FALSE  
2 bananna yellow TRUE

Filtering dataframe based on two columns

1 Answers1