1

I have one question in R.
I have the following example code for a question.

> exdata <- data.frame(a = rep(1:4, each = 3), 
+                      b = c(1, 1, 2, 4, 5, 3, 3, 2, 3, 9, 9, 9))
> exdata
   a b
1  1 1
2  1 1
3  1 2
4  2 4
5  2 5
6  2 3
7  3 3
8  3 2
9  3 3
10 4 9
11 4 9
12 4 9
> exdata[duplicated(exdata), ]
   a b
2  1 1
9  3 3
11 4 9
12 4 9

I tried to use the duplicated() function to find all the duplicate records in the exdata dataframe, but it only finds a part of the duplicated records, so it is difficult to confirm intuitively whether duplicates exist.

I'm looking for a solution that returns the following results

   a b
1  1 1
2  1 1
7  3 3
9  3 3
10 4 9
11 4 9
12 4 9

Can use the duplicated() function to find the right solution?
Or is there a way to use another function?
I would appreciate your help.

De Novo
  • 7,120
  • 1
  • 23
  • 39
Lovetoken
  • 438
  • 4
  • 11

1 Answers1

3

duplicated returns a logical vector with the length equal to the length of its argument, corresponding to the second time a value exists. It has a method for data frames, duplicated.data.frame, that looks for duplicated rows (and so has a logical vector of length nrow(exdata). Your extraction using that as a logical vector is going to return exactly those rows that have occurred once before. It WON'T however, return the first occurence of those rows.

Look at the index vector your using:

duplicated(exdata)
# [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE

But you can combine it with fromLast = TRUE to get all of the occurrences of these rows:

exdata[duplicated(exdata) | duplicated(exdata, fromLast = TRUE),]
#    a b
# 1  1 1
# 2  1 1
# 7  3 3
# 9  3 3
# 10 4 9
# 11 4 9
# 12 4 9

look at the logical vector for duplicated(exdata, fromLast = TRUE) , and the combination with duplicated(exdata) to convince yourself:

duplicated(exdata, fromLast = TRUE)
#  [1]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE
duplicated(exdata) | duplicated(exdata, fromLast = TRUE)
# [1]  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
De Novo
  • 7,120
  • 1
  • 23
  • 39