2

I am attempting to create a new data frame without the rows of duplicated values in column "id"

I've tried some options however I would like to keep things consistent and take advantage of dplyr. I tried working with the distinct() function but to no avail.

library(tidyverse)

df <- structure(list(id = c("1-2", "1-3", "1-3", "1-4", 
"1-5", "1-7", "1-7", "1-7", "1-9", 
"1-22"), award_amount = c(3000, 596500, 1125000, 5881515, 
155555, 686500, 207718, 250000, 750000, 3500000)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

s4 <- df  %>% 
  distinct(id, .keep_all = TRUE) 
s4 

I would like both rows containing duplicates in the "id" column to not exist in the final table

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Johnny Thomas
  • 623
  • 5
  • 13

3 Answers3

2

Here's one way using dplyr -

df %>% 
  group_by(id) %>% 
  filter(n() == 1) %>% 
  ungroup()

# A tibble: 5 x 2
  id    award_amount
  <chr>        <dbl>
1 1-2           3000
2 1-4        5881515
3 1-5         155555
4 1-9         750000
5 1-22       3500000
Shree
  • 10,835
  • 1
  • 14
  • 36
1

We can also use(ungroup as required):

df %>% 
  group_by(id) %>% 
  filter(!anyDuplicated(id))
 Groups:   id [5]
  id    award_amount
  <chr>        <dbl>
1 1-2           3000
2 1-4        5881515
3 1-5         155555
4 1-9         750000
5 1-22       3500000
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
1

Here's a slightly different way without grouping using a trick with duplicated. Checking duplicates from both ends of the vector means that all duplicates are flagged as TRUE. We can negate the result and then filter to the desired rows.

library(tidyverse)
df <- structure(list(id = c("1-2", "1-3", "1-3", "1-4", "1-5", "1-7", "1-7", "1-7", "1-9", "1-22"), award_amount = c(3000, 596500, 1125000, 5881515, 155555, 686500, 207718, 250000, 750000, 3500000)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
df %>%
  filter(!(duplicated(id) | duplicated(id, fromLast = TRUE)))
#> # A tibble: 5 x 2
#>   id    award_amount
#>   <chr>        <dbl>
#> 1 1-2           3000
#> 2 1-4        5881515
#> 3 1-5         155555
#> 4 1-9         750000
#> 5 1-22       3500000

Created on 2019-06-14 by the reprex package (v0.3.0)

Calum You
  • 14,687
  • 4
  • 23
  • 42