Delete all duplicated rows in R

Question

I have a data.frame which has duplicate observations, how do I delete all the duplicated ones based on the first column (if their first data is the same, then delete these entries entirely)?

> a=c(1,4,5,5,6,6)
> b=c(2,5,7,4,4,2)
> c=c("a","b","c","a","b","c")
> test=data.frame(a,b,c)
> test
  a b c
1 1 2 a
2 4 5 b
3 5 7 c
4 5 4 a
5 6 4 b
6 6 2 c

I don't want to keep any of the duplicate rows so that my final output will be

  a b c
1 1 2 a
2 4 5 b

I've tried unique and duplicate function but they both keep the first duplicate rows (i.e., if there are 5 duplicate records then 4 of them will be deleted), like

What should I do? Thanks!

I do not understand why your final output should be that. I do not understand why you consider other rows as duplicated — Pop, Jul 22 '14 at 08:03
@Pop I mean duplicated ones based on the first column. The 3rd and 4th rows have the same 5 as their [,1], and the 5th and 6th rows have the same 6 as their [,1]. — Natalia, Jul 22 '14 at 08:05

score 3 · Accepted Answer · answered Jul 22 '14 at 08:11

3

You can use table() to get a frequency table of your column, then use the result to subset:

singletons <- names(which(table(test$a) == 1))
test[test$a %in% singletons, ]

  a b c
1 1 2 a
2 4 5 b

answered Jul 22 '14 at 08:11

Andrie

176,377
47
447
496

score 2 · Answer 2 · answered Jul 22 '14 at 08:16

2

Using dplyr

require(dplyr)
test <- test %>% group_by(a) %>% filter(n()==1)
test

  a b c
1 1 2 a
2 4 5 b

answered Jul 22 '14 at 08:16

Rentrop

20,979
10
72
100

Thanks, it also works! Actually I know nothing about `dplyr` before. Andrie answered first so I accepted his answer. Thanks again! – Natalia Jul 22 '14 at 08:25

score 1 · Answer 3 · answered Jul 22 '14 at 08:14

You first search for the first column values of the duplicated rows:

val <- test[duplicated(test[,1]),1]
[1] 5 6

Then you search for the rows in which these values can be found

rows <- test[,1] %in% test[duplicated(test[,1]),1]
[1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE

Then you select all rows except these:

test[! rows,]
  a b c
1 1 2 a
2 4 5 b

score 0 · Answer 4 · answered Jul 22 '14 at 08:10

0

Strange request, but if you want to remove all rows where there is a duplicate in any column while ignoring the other columns:

test[!duplicated(test$a) & ! duplicated(test$b) & ! duplicated(test$c),]
  a b c
1 1 2 a
2 4 5 b
3 5 7 c

But I don't see how '5 7 c' is a duplicate in your example.

answered Jul 22 '14 at 08:10

JeremyS

3,497
1
17
19

Because the next row `5 4 a` also starts with 5. Anyway thanks! – Natalia Jul 22 '14 at 08:26

score 0 · Answer 5 · answered Feb 11 '21 at 19:15

0

Easy one step removal of duplicates:

my_df <- my_df[-which(duplicated(my_df)), ]

answered Feb 11 '21 at 19:15

markhogue

1,056
1
6
16

Delete all duplicated rows in R

5 Answers5

Linked