0

I have a data frame with 1000 observations on 20 variables.

I want to select only the rows that have a unique combination across columns, regardless of their order.

That is, if a combination is ABA and another is BAA, I want the code only to return one of these combinations.

To identify unique combinations I run a simple unique command across multiple variables.

How would you write such a code?

Jaap
  • 81,064
  • 34
  • 182
  • 193
wake_wake
  • 1,332
  • 2
  • 19
  • 46

1 Answers1

2

We can sort the data by row using apply with MARGIN=1, then use duplicated to return the logical index, negate it and get the unique rows in the data.

dat[!duplicated(t(apply(dat, 1, sort))),]
akrun
  • 874,273
  • 37
  • 540
  • 662