In R, how can I create subset data frame with all duplicate observations?

Question

Lot's of questions out there touching the topic of duplicate observations but none of them worked for me so far.

In this questions I learned how to select all duplicates from a vector.

# vector
id <- c("a","b","b","c","c","c","d","d","d","d")

#To return ALL duplicated values by specifying fromLast argument:
id[duplicated(id) | duplicated(id, fromLast=TRUE)]
## [1] "b" "b" "c" "c" "c" "d" "d" "d" "d"

#Yet another way to return ALL duplicated values, using %in% operator:
id[id %in% unique(id[duplicated(id)])]
## [1] "b" "b" "c" "c" "c" "d" "d" "d" "d"

Now having a data frame like this one:

dat <- data.frame(x = c(1, 1, 2, 2, 3), 
                  y = c(5, 5, 6, 7, 8), 
                  z = c('a', 'b', 'c', 'd', 'e'))

How could I select all observations that simultaneously have duplicate values of x and y, irrespective of z?

If the word `simultaneously` has some meaning , none of the answers below has taken that into consideration. If it does not then it is duplicate of the above question marked by @thelatemail. — Ronak Shah, Nov 22 '17 at 02:57
@RonakShah perhaps it wasn't greatest choice of word and didn't mean more than `in the same time` — radek, Nov 22 '17 at 03:22
@thelatemail I missed this one! :/ Indeed does what I need too so if the answers are duplicate the question could be closed I guess. — radek, Nov 22 '17 at 03:23

score 5 · Accepted Answer · answered Nov 22 '17 at 02:53

5

Another option using dplyr

library(dplyr)
dat %>% group_by(x,y) %>% filter(n()>1) 

# A tibble: 2 x 3
#     x     y      z
#   <dbl> <dbl> <fctr>
#1    1     5      a
#2    1     5      b

answered Nov 22 '17 at 02:53

Santosh M.

2,356
1
17
29

score 4 · Answer 2 · answered Nov 22 '17 at 02:49

4

You can use data.table like so:

library(data.table)
setDT(dat)
# selects all (x,y) pairs that occur more than once
dat[ , if (.N > 1L) .SD, by = .(x, y)]

answered Nov 22 '17 at 02:49

MichaelChirico

33,841
14
113
198

score 2 · Answer 3 · answered Nov 22 '17 at 02:53

2

In base R

dat[ave(paste(dat$x,dat$y),dat$y,FUN=function(x) length(x))>1,]
  x y z
1 1 5 a
2 1 5 b

answered Nov 22 '17 at 02:53

BENY

317,841
20
164
234

In R, how can I create subset data frame with all duplicate observations?

3 Answers3