Identifying which values are duplicates in R

Question

I would like to identify which observations are duplicates based on the values within one variable, however I would like all of the observations which generate the duplicates to be identified rather than just the second time they appear. For example:

x <- c(1,2,3,4,5,7,5,7)
duplicated(x)
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

Rather than identify the last two elements, I would like the last four elements to be identified as well as which element is matched (e.g. element 5 and 7, 6 and 8). Thanks.

I'd like it to be flexible to a different ordering. – coding_heart Oct 26 '14 at 15:15 — coding_heart, Oct 26 '14 at 15:15

score 2 · Answer 1 · answered Oct 26 '14 at 15:28

2

You can use duplicated twice:

duplicated(x) | duplicated(x, fromLast = TRUE)
# [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

answered Oct 26 '14 at 15:28

Sven Hohenstein

80,497
17
145
168

Rich Scriven · Accepted Answer · 2014-10-26T15:39:41.177

1

You could try a table

x <- c(1,2,3,4,5,7,5,7)
tab <- table(x) > 1
x[x %in% names(which(tab))]
# [1] 5 7 5 7

Another method inspired by @rawr's comment is

x %in% x[duplicated(x)]
# [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
x[ x %in% x[duplicated(x)] ]
# [1] 5 7 5 7
which(x %in% x[duplicated(x)])
# [1] 5 6 7 8

edited Oct 26 '14 at 15:39

answered Oct 26 '14 at 15:18

Rich Scriven

97,041
11
181
245

Identifying which values are duplicates in R

2 Answers2