0

I'm trying to remove all duplicate values based on multiple variable using dplyr. Here's how I do it without dplyr:

dat = data.frame(id=c(1,1,2),date=c(1,1,1))
dat = dat[!(duplicated(dat[c('id','date')]) | duplicated(dat[c('id','date')],fromLast=TRUE)),]

It should only return id number 2.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
spazznolo
  • 747
  • 3
  • 9

1 Answers1

3

This can be done with a group_by/filter operation in tidyverse. Grouped by the columns of interest (here used group_by_all as all the columns in the dataset are grouped. Instead can also make use of group_by_at if a selected number of columns are needed)

library(dplyr)
dat %>% 
   group_by_all() %>%
   filter(n()==1)

Or simply group_by

dat %>% 
   group_by(id, date) %>%
   filter(n() == 1)

If the OP intended to use the duplicated function

dat %>%
  filter_at(vars(id, date),
        any_vars(!(duplicated(.)|duplicated(., fromLast = TRUE))))
# id date
#1  2    1
akrun
  • 874,273
  • 37
  • 540
  • 662