1

This may be a failure to know the right keywords to search, but I'm looking for a way remove duplicates based on an an order reversal between two non-numeric columns. Here is a very small subset of my data:

ANIMAL1<-c("20074674_K.v1","20085105_K.v1","20085638_K.v1","20085646_K.v1")
ANIMAL2<-c("20085105_K.v1","20074674_K.v1","20074674_K.v1","20074674_K.v1")
exclusions<-c(13,13,5,10)
data<-data.frame(ANIMAL1,ANIMAL2,exclusions)
 ANIMAL1 ANIMAL2 exclusions
1 20074674_K.v1 20085105_K.v1 13
2 20085105_K.v1 20074674_K.v1 13
3 20085638_K.v1 20074674_K.v1 5
4 20085646_K.v1 20074674_K.v1 10

The first and second row are duplicate comparisons, the order of animals is just reversed between the first two columns. It doesn't matter which one is deleted, but I want to delete one of the duplicates... and all the rest of the duplicates that fit this logic in my larger dataframe. I'm used to subsetting according to the logic in these questions: Remove duplicate column pairs, sort rows based on 2 columns and the other posts that come up with searching "remove duplicates based on 2 columns" but I haven't yet found anything yet that approximates my use case. Here is what I would like my data to look like after the duplication removal:

 ANIMAL1 ANIMAL2 exclusions
1 20085105_K.v1 20074674_K.v1 13
2 20085638_K.v1 20074674_K.v1 5
3 20085646_K.v1 20074674_K.v1 10    

Thanks much!

1 Answers1

1
data[duplicated(t(apply(data,1,sort))) == FALSE,]
  1. Sort by each row so that I make each row's combo of ANIMAL1 or ANIMAL2 same across each row if they are in different columns. Exclusions are sorted, too, but in this case you don't have to.
  2. When it is sorted by rows, data needs to be transposed back to columns as original data set
  3. Flag row duplicates and strip them out.
Quality Catalyst
  • 6,531
  • 8
  • 38
  • 62
xgg
  • 159
  • 6
  • Mind you to explain your answer? – Quality Catalyst Aug 03 '17 at 10:32
  • I think this works, just hacked it together, basically... 1. sort by each row, so that I make each row's combo of ANIMAL1 or ANIMAL2 same across each row if they are in different columns, exclusions are sorted too as well, but in this case you don't have to. 2. When it is sorted by rows, data needs to be transposed back to columns as original data set 3. flag row duplicates and strip them out. Not sure if this is what you are looking for... – xgg Aug 07 '17 at 16:54