Find all the duplicate records using duplicated() on R

Question

I have one question in R.
I have the following example code for a question.

> exdata <- data.frame(a = rep(1:4, each = 3), 
+                      b = c(1, 1, 2, 4, 5, 3, 3, 2, 3, 9, 9, 9))
> exdata
   a b
1  1 1
2  1 1
3  1 2
4  2 4
5  2 5
6  2 3
7  3 3
8  3 2
9  3 3
10 4 9
11 4 9
12 4 9
> exdata[duplicated(exdata), ]
   a b
2  1 1
9  3 3
11 4 9
12 4 9

I tried to use the duplicated() function to find all the duplicate records in the exdata dataframe, but it only finds a part of the duplicated records, so it is difficult to confirm intuitively whether duplicates exist.

I'm looking for a solution that returns the following results

Can use the duplicated() function to find the right solution?
Or is there a way to use another function?
I would appreciate your help.

You have 4/9 three times in the expected outcome. But you have 3/3 only once rather than twice. Any logic behind this? — jazzurro, Mar 27 '18 at 00:33

De Novo · Accepted Answer · 2018-03-27T00:42:57.397

duplicated returns a logical vector with the length equal to the length of its argument, corresponding to the second time a value exists. It has a method for data frames, duplicated.data.frame, that looks for duplicated rows (and so has a logical vector of length nrow(exdata). Your extraction using that as a logical vector is going to return exactly those rows that have occurred once before. It WON'T however, return the first occurence of those rows.

Look at the index vector your using:

duplicated(exdata)
# [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE

But you can combine it with fromLast = TRUE to get all of the occurrences of these rows:

exdata[duplicated(exdata) | duplicated(exdata, fromLast = TRUE),]
#    a b
# 1  1 1
# 2  1 1
# 7  3 3
# 9  3 3
# 10 4 9
# 11 4 9
# 12 4 9

look at the logical vector for duplicated(exdata, fromLast = TRUE) , and the combination with duplicated(exdata) to convince yourself:

duplicated(exdata, fromLast = TRUE)
#  [1]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE
duplicated(exdata) | duplicated(exdata, fromLast = TRUE)
# [1]  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE

Find all the duplicate records using duplicated() on R

1 Answers1