3

I am trying to remove rows from a data frame that contain either combination of 2 columns. For example, the following code:

vct <- c("A", "B", "C")
a <- b <- vct
combo <- expand.grid(a,b) #generate all posible combinations
combo <- combo[!combo[,1] == combo[,2],] #removes rows with matching column

generates this data frame:

 Var1 Var2
2    B    A
3    C    A
4    A    B
6    C    B
7    A    C
8    B    C

How can I remove rows are duplicates of any combination of the 2 columns, so that i.e. #4 A B is removed because #2 B A is already present? The resulting data frame would look like this:

 Var1 Var2
2    B    A
3    C    A
4    C    B
Jaap
  • 81,064
  • 34
  • 182
  • 193
trock2000
  • 302
  • 4
  • 13

1 Answers1

5

We can sort by row using apply with MARGIN=1, transpose (t) the output, use duplicated to get the logical index of duplicate rows, negate (!) to get the rows that are not duplicated, and subset the dataset.

combo[!duplicated(t(apply(combo, 1, sort))),]
#   Var1 Var2
#2    B    A
#3    C    A
#6    C    B
akrun
  • 874,273
  • 37
  • 540
  • 662