R: remove rows with same in all columns

Question

Input file:

y <- read.table(textConnection('
   c1   c2   c3
1  a    b    -1
2  a    b    -1
3  a    c    1
4  a    b    1
5  a    b    -1
'), header=TRUE)

thus, y is

  c1 c2 c3
1  a  b -1
2  a  b -1
3  a  c  1
4  a  b  1
5  a  b  -1

the output file would be:

  c1 c2 c3
1  a  b -1
3  a  c  1
4  a  b  1

How to remove multiple or duplicate rows with same entry in all columns?

see also eg http://stackoverflow.com/questions/5016418/summarise-data-frame-ignoring-repetition , http://stackoverflow.com/questions/2626567/collapsing-data-frame-by-selecing-one-row-per-group — Joris Meys, Apr 28 '11 at 15:05

score 9 · Answer 1 · answered Apr 28 '11 at 14:47

9

Try unique(y)

> unique(y)
  c1 c2 c3
1  a  b -1
3  a  c  1
4  a  b  1

answered Apr 28 '11 at 14:47

Chase

score 3 · Answer 2 · answered Apr 28 '11 at 14:50

3

?unique. Watch out for floating point variables though...

answered Apr 28 '11 at 14:50

Nick Sabbe

score 2 · Answer 3 · answered Apr 28 '11 at 15:22

2

In addition to unique(), duplicated() is also helpful for identifying which rows are duplicates.

For example:

subset(y, !duplicated(y))

But as Chase and Nick show, unique() is what you are looking for here and is more efficient.

answered Apr 28 '11 at 15:22

jthetzel

score 0 · Answer 4 · answered Jul 07 '18 at 10:37

0

You can also use distinct() from the dplyr package

> library(dplyr, quietly = T)
> distinct(y)
  c1 c2 c3
1  a  b -1
2  a  c  1
3  a  b  1

answered Jul 07 '18 at 10:37

BRCN

4 Answers4