1

Following basic example:

v1 <- c("a","b","c","a","b")
v2 <- c(1,2,3,1,1)
v3 <- rnorm(5,5) 

dat <- data.frame(cbind(v1,v2,v3))

I want to remove all rows with same value in v1 and v2.

To remove duplicated rows I can use

dat[!duplicated(dat[,c("v1","v2")]),]

   v1 v2 v3
1  a  1 6.48929449801677
2  b  2 4.89050807004701
3  c  3 5.57089903349316
5  b  1 4.08152834124853

But I want to remove the first row also.

Does anyone have a simple solution? Maybe some option in duplicated which I was not able to identify.

Sebastian
  • 2,430
  • 4
  • 23
  • 40

1 Answers1

3

We can use the duplicated with fromLast=TRUE option to search for duplicates in the reverse direction and then use | to get all the duplicates. Negating the logical index gets only the index for unique rows which we subset later.

dat[!(duplicated(dat[,c("v1","v2")])|
     duplicated(dat[,c("v1", "v2")], fromLast=TRUE)),]
akrun
  • 874,273
  • 37
  • 540
  • 662