R - How do I remove duplicates without retaining a single duplicated row?

Question

I am trying to remove duplicates in a data frame of real estate sales data. Sometimes, one deed will contain many parcels as part of the sale. Because the data goes by parcel, one sale shows up as several rows, each different parcels but with the same deed number and sale price - we do not want these in our dataset.

I want to remove them entirely, without retaining a single row. Using duplicated() or unique() will remove the duplicates, but will retain one row of the duplicated values such that the new data frame contains unique values for that column. This will not work for us - we want to get rid of all duplicated rows entirely. So, how can I identify that deed_number and sale price are duplicated and get rid of all of these rows without retaining any?

score 2 · Accepted Answer · answered Mar 20 '19 at 19:57

2

You could do a count by your grouping variables and only keep the records with a count equal to one:

library(tidyverse)

df <- tibble(
  deed = c(1, 1, 1, 2, 3, 3, 4, 5),
  price = c(10, 10, 10, 20, 30, 30, 40, 50)
)
df
#> # A tibble: 8 x 2
#>    deed price
#>   <dbl> <dbl>
#> 1     1    10
#> 2     1    10
#> 3     1    10
#> 4     2    20
#> 5     3    30
#> 6     3    30
#> 7     4    40
#> 8     5    50

df %>%
  count(deed, price) %>%
  filter(n == 1)
#> # A tibble: 3 x 3
#>    deed price     n
#>   <dbl> <dbl> <int>
#> 1     2    20     1
#> 2     4    40     1
#> 3     5    50     1

^{Created on 2019-03-20 by the reprex package (v0.2.1)}

answered Mar 20 '19 at 19:57

JasonAizkalns

20,243
8
57
116

1

I guess this drops other columns (if there are any), but you could alternately do `df %>% group_by(deed, price) %>% filter(n() == 1)` – Frank Mar 20 '19 at 19:59
1

@Frank correct, or just use `add_count` instead. – JasonAizkalns Mar 20 '19 at 20:01
This worked perfectly, thank you. – gcbrianbrown Mar 20 '19 at 20:16

R - How do I remove duplicates *without* retaining a single duplicated row?

1 Answers1

R - How do I remove duplicates without retaining a single duplicated row?