Filter rows based on a swapped combinations

Question

Imagine I have a data frame with three columns were column 1 and 2 depicts unique combination with a certain output 'value'. However, I want to filter out those rows were the columns are actually just swapped, since the outcome is the same and retain one outcome of one set of combination.

e.g. 2 - 1 = 1 and 1 - 2 = 1 is technically the same

df <- data.frame(column1 = c(2,3,4,1,3,4,1,2,4), 
                 column2 = c(1,1,1,2,2,2,3,3,3), 
                 value = c(1,2,10,1,2,4,2,2,5))

Since I don't have any reasonable code which can tackle this issue, I appreciate any help and hint!!

out <- data.frame(column1 = c(2,3,4,3,4,4), column2 = c(1,1,1,2,2,3), value = c(1,2,10,2,4,5) — c.k, Dec 07 '19 at 01:53
in fact there are > 10k pairs within the data frame where column1 and column2 are swapped where indeed the output is the same (derived from hamming distance calculation). Since, 1 : 2 == 2 : 1 I want to filter out all those duplicated events with the same output! Thanks — c.k, Dec 07 '19 at 01:54

score 0 · Accepted Answer · answered Dec 07 '19 at 01:57

0

You can use pmin and pmax to sort the columns and then select unique rows.

library(dplyr)
df %>%
  mutate(temp1 = pmax(column1, column2), 
         temp2 = pmin(column1, column2)) %>%
  select(temp1, temp2, value) %>%
  distinct()

#  temp1 temp2 value
#1     2     1     1
#2     3     1     2
#3     4     1    10
#4     3     2     2
#5     4     2     4
#6     4     3     5

answered Dec 07 '19 at 01:57

Ronak Shah

377,200
20
156
213

This is amazing!! Thank you so much :) – c.k Dec 07 '19 at 02:01

Filter rows based on a swapped combinations

1 Answers1