-1

I would like to identify which observations are duplicates based on the values within one variable, however I would like all of the observations which generate the duplicates to be identified rather than just the second time they appear. For example:

x <- c(1,2,3,4,5,7,5,7)
duplicated(x)
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

Rather than identify the last two elements, I would like the last four elements to be identified as well as which element is matched (e.g. element 5 and 7, 6 and 8). Thanks.

coding_heart
  • 1,245
  • 3
  • 25
  • 46

2 Answers2

2

You can use duplicated twice:

duplicated(x) | duplicated(x, fromLast = TRUE)
# [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
1

You could try a table

x <- c(1,2,3,4,5,7,5,7)
tab <- table(x) > 1
x[x %in% names(which(tab))]
# [1] 5 7 5 7

Another method inspired by @rawr's comment is

x %in% x[duplicated(x)]
# [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
x[ x %in% x[duplicated(x)] ]
# [1] 5 7 5 7
which(x %in% x[duplicated(x)])
# [1] 5 6 7 8
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245