Remove duplicate rows by considering two columns and ignore the order

Question

I have a dataframe with three columns. Column 1, Column 2 and third column is Value. Data frame is sorted according to the value (desc). In the below case, I want to remove third row because A>B is already there, so I don't want to consider B>A. How can I remove third column (such instances). This applies to all, for example, A>C is already present, so C>A should be removed.

Column1 Column2 Value
A       B       10
A       C       8
B       A       6

score 3 · Answer 1 · answered Mar 30 '17 at 10:45

We can use duplicated after sorting by rows for the subset of columns i.e. using the columns of interest, 1 & 2, and then with the logical vector subset the rows

df1[!duplicated(t(apply(df1[1:2], 1, sort))),]
#   Column1 Column2 Value
#1       A       B    10
#2       A       C     8

Remove duplicate rows by considering two columns and ignore the order

1 Answers1

Related