2

Input file:

y <- read.table(textConnection('
   c1   c2   c3
1  a    b    -1
2  a    b    -1
3  a    c    1
4  a    b    1
5  a    b    -1
'), header=TRUE)

thus, y is

  c1 c2 c3
1  a  b -1
2  a  b -1
3  a  c  1
4  a  b  1
5  a  b  -1

the output file would be:

  c1 c2 c3
1  a  b -1
3  a  c  1
4  a  b  1

How to remove multiple or duplicate rows with same entry in all columns?

Catherine
  • 5,345
  • 11
  • 30
  • 28
  • see also eg http://stackoverflow.com/questions/5016418/summarise-data-frame-ignoring-repetition , http://stackoverflow.com/questions/2626567/collapsing-data-frame-by-selecing-one-row-per-group – Joris Meys Apr 28 '11 at 15:05

4 Answers4

9

Try unique(y)

> unique(y)
  c1 c2 c3
1  a  b -1
3  a  c  1
4  a  b  1
Chase
  • 67,710
  • 18
  • 144
  • 161
3

?unique. Watch out for floating point variables though...

Nick Sabbe
  • 11,684
  • 1
  • 43
  • 57
2

In addition to unique(), duplicated() is also helpful for identifying which rows are duplicates.

For example:

subset(y, !duplicated(y))

But as Chase and Nick show, unique() is what you are looking for here and is more efficient.

jthetzel
  • 3,603
  • 3
  • 25
  • 38
0

You can also use distinct() from the dplyr package

> library(dplyr, quietly = T)
> distinct(y)
  c1 c2 c3
1  a  b -1
2  a  c  1
3  a  b  1
BRCN
  • 635
  • 1
  • 12
  • 26