2

Apologies if this is a duplicate question, as it seems like something simple enough that may have been asked already, although a quick search of the question didn't bring up an exact match to my particular issue - if it exists, would appreciate if you shared the question.

Dataframe for reference - I've made the example dataframe by hand, so don't have dput() for now, but could provide it:
.

> head(data[, 1:8], n = 4)
             A         B        C         D         E         F       
1       Donald      Will      Joe     Chris      Greg     Isaiah  
2       Donald      Will     Jeff     Chris      Greg     Isaiah
3       Donald      Will     Jeff     Steve      Greg     Isaiah
4       Donald      Will     Jeff     Steve    Isaiah       Greg

.
In this (small example of my larger) dataframe, I need remove any duplicate rows, where a row is considered a duplicate if it has all of the same names as another row, without regard to which columns the names are in. So in this case, row 4 would be considered a duplicate of row 3, and I would want to remove (either) row.

Of note, the order of the columns is very important in my dataframe, and so I cannot simply sort each row alphabetically and then remove exact duplicates.

Thanks for any help!!

Canovice
  • 9,012
  • 22
  • 93
  • 211
  • 1
    You could create a new variable in which you order the names and than paste them together as one string. Now you can use that string to find duplicates with the duplicate function – Huub Hoofs Sep 19 '16 at 21:04
  • okay thanks, will give this a try - in the meantime, was looking / hoping for a solution / function that worked with the dataframe – Canovice Sep 19 '16 at 21:10

1 Answers1

5
df <- read.table(header=TRUE,stringsAsFactors=FALSE,text="
             A         B        C         D         E         F       
1       Donald      Will      Joe     Chris      Greg     Isaiah  
2       Donald      Will     Jeff     Chris      Greg     Isaiah
3       Donald      Will     Jeff     Steve      Greg     Isaiah
4       Donald      Will     Jeff     Steve    Isaiah       Greg")


df <- df[!duplicated(t(apply(df,1,sort))),]
ddunn801
  • 1,900
  • 1
  • 15
  • 20