-2

I have a dataframe with 5 columns and many many rows, that have repetition of elements only for the first 3 columns (in short, it is a volume built by several volumes, and so there are same coordinates (x,y,z) with different labels, and I would like to eliminate the repeated coordinates).

How can I eliminate these with R commands?

Thanks AV

  • Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – zx8754 Mar 01 '16 at 08:01

1 Answers1

2

You can use duplicated function, e.g. :

# create an example data.frame
Lab1<-letters[1:10]
Lab2<-LETTERS[1:10]
x <- c(3,4,3,3,4,2,4,3,9,0)
y <- c(3,4,3,5,4,2,1,5,7,2)
z <- c(8,7,8,8,4,3,1,8,6,3)
DF <- data.frame(Lab1,Lab2,x,y,z)

> DF
   Lab1 Lab2 x y z
1     a    A 3 3 8
2     b    B 4 4 7
3     c    C 3 3 8
4     d    D 3 5 8
5     e    E 4 4 4
6     f    F 2 2 3
7     g    G 4 1 1
8     h    H 3 5 8
9     i    I 9 7 6
10    j    J 0 2 3

# remove rows having repeated x,y,z 
DF2 <- DF[!duplicated(DF[,c('x','y','z')]),]

> DF2
   Lab1 Lab2 x y z
1     a    A 3 3 8
2     b    B 4 4 7
4     d    D 3 5 8
5     e    E 4 4 4
6     f    F 2 2 3
7     g    G 4 1 1
9     i    I 9 7 6
10    j    J 0 2 3

EDIT :

To allow choosing amongst the rows having the same coordinates, you can use for example by function (even if is less efficient then previous approach) :

res <- by(DF,
      INDICES=paste(DF$x,DF$y,DF$z,sep='|'),
      FUN=function(equalRows){
             # equalRows is a data.frame with the rows having the same x,y,z
             # for exampel here we choose the first row ordering by Lab1 then Lab2
             row <- equalRows[order(equalRows$Lab1,equalRows$Lab2),][1,]
             return(row)
      })

DF2 <- do.call(rbind.data.frame,res)
> DF2
      Lab1 Lab2 x y z
0|2|3    j    J 0 2 3
2|2|3    f    F 2 2 3
3|3|8    a    A 3 3 8
3|5|8    d    D 3 5 8
4|1|1    g    G 4 1 1
4|4|4    e    E 4 4 4
4|4|7    b    B 4 4 7
9|7|6    i    I 9 7 6
digEmAll
  • 56,430
  • 9
  • 115
  • 140
  • Many thanks, it is exactly what I need! But, How can I choose what repeated label delete, e.g. in your example A vs E for the x=3, y=3, z=8? – user3459094 Mar 01 '16 at 13:15