-1

I have a table with ~14000 entries. I want to identify entries that have multiple copies (i.e. 2+). I tried the duplicated function and sure enough, I got 55 entries that were duplicated (2x). However, I suspect that there could be entries with 3+.

At this point, is there a function that can address this, or should I write my own method (i.e. with factors)?

Thank you.

Johnathan
  • 1,877
  • 4
  • 23
  • 29
  • No, it finds all duplicates. See [this question](http://stackoverflow.com/q/7854433/271616) and note the `fromLast` argument. – Joshua Ulrich Aug 18 '14 at 23:33

2 Answers2

1

The duplicated function simply gives you a logical vector telling you which entries are duplicates of previous entries and which entries are not duplicates. In that sense, it does not merely identify elements which are doubled, but instead gives you information about all elements for which there are more than one copy.

> a = c(1,2,3,4,5,5,5)
> duplicated(a)
[1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

If you want to know exactly which values are duplicated, you can take all elements for which duplicated returns TRUE and then unique them.

unique(a[duplicated(a)])
merlin2011
  • 71,677
  • 44
  • 195
  • 329
0

If all you care about is finding the values that have more than 1 copy in your vector/table, then something like this would work:

x <- sample(1:10, 20, replace=TRUE)
as.integer(names(which(table(x) > 1)))

If however, you want to find the indexes/positions of the 2nd, 3rd, etcetera occurrences, then duplicated() does the trick (also for 3+).

sel <- which(duplicated(x))
x[sel]

However, do note that duplicated() does not tag the 1st occurrence as a duplicate; this may cause some confusion.

meuleman
  • 378
  • 1
  • 7