Looking to remove both rows if duplicated in a column using dplyr

Question

I am attempting to create a new data frame without the rows of duplicated values in column "id"

I've tried some options however I would like to keep things consistent and take advantage of dplyr. I tried working with the distinct() function but to no avail.

library(tidyverse)

df <- structure(list(id = c("1-2", "1-3", "1-3", "1-4", 
"1-5", "1-7", "1-7", "1-7", "1-9", 
"1-22"), award_amount = c(3000, 596500, 1125000, 5881515, 
155555, 686500, 207718, 250000, 750000, 3500000)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

s4 <- df  %>% 
  distinct(id, .keep_all = TRUE) 
s4

I would like both rows containing duplicates in the "id" column to not exist in the final table

score 2 · Accepted Answer · answered Jun 14 '19 at 17:51

Here's one way using dplyr -

df %>% 
  group_by(id) %>% 
  filter(n() == 1) %>% 
  ungroup()

# A tibble: 5 x 2
  id    award_amount
  <chr>        <dbl>
1 1-2           3000
2 1-4        5881515
3 1-5         155555
4 1-9         750000
5 1-22       3500000

score 1 · Answer 2 · answered Jun 14 '19 at 18:00

We can also use(ungroup as required):

df %>% 
  group_by(id) %>% 
  filter(!anyDuplicated(id))
 Groups:   id [5]
  id    award_amount
  <chr>        <dbl>
1 1-2           3000
2 1-4        5881515
3 1-5         155555
4 1-9         750000
5 1-22       3500000

Calum You · Answer 3 · 2019-06-14T18:12:36.737

Here's a slightly different way without grouping using a trick with duplicated. Checking duplicates from both ends of the vector means that all duplicates are flagged as TRUE. We can negate the result and then filter to the desired rows.

library(tidyverse)
df <- structure(list(id = c("1-2", "1-3", "1-3", "1-4", "1-5", "1-7", "1-7", "1-7", "1-9", "1-22"), award_amount = c(3000, 596500, 1125000, 5881515, 155555, 686500, 207718, 250000, 750000, 3500000)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
df %>%
  filter(!(duplicated(id) | duplicated(id, fromLast = TRUE)))
#> # A tibble: 5 x 2
#>   id    award_amount
#>   <chr>        <dbl>
#> 1 1-2           3000
#> 2 1-4        5881515
#> 3 1-5         155555
#> 4 1-9         750000
#> 5 1-22       3500000

^{Created on 2019-06-14 by the reprex package (v0.3.0)}

Looking to remove both rows if duplicated in a column using dplyr

3 Answers3