Removing mutual reference rows in R dataframe i.e. when (a, b) value in a row exists as (b, a) in another row of the same dataframe

Question

Let say that I have a dataframe (df) where it is supposed to contain friendship links between individuals. This way, a value (e.g. individual ID) in column A and a value in column B, indicates that individual A is friend (in relationship) with individual B. In fact, such a df can easily be converted to a graph (e.g. igraph).

Since the relationships are mutual, it is sufficient that we have A - B values.

However, I have such a large df where some of the rows also include B - A values as well (like directed graph, A is friend of B and B is friend of A, which is redundant) and the question is how to remove these redundant rows.

here is a very simple example:

df <- data.frame("A"= c(1, 10, 1,  1,  2,  2, 14, 4),
                 "B"= c(10, 1, 11, 12, 13, 14, 2, 15))

A          B
1         10
10         1
1         11
1         12
2         13
2         14
14        2
4         15

After removing mutual references, the df should become:

A          B
1         10
1         11
1         12
2         13
2         14
4         15

[See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data and all necessary code (not from pictures, which we can't copy the code from). — camille, Sep 27 '19 at 18:23
Many thanks for all the comments and hints. Special thanks to @qq3254 for notifying the mistake (comma is added 13) as well as the excellent solution. — ama, Sep 28 '19 at 01:31

score 2 · Accepted Answer · answered Sep 27 '19 at 18:50

I believe you are looking for something like this.

Sort the data frame horizontally. Remove duplicate rows.

df <- data.frame("A" = c(1, 10, 1,  1,  2,  2, 14, 4),
                     "B" = c(10, 1, 11, 12, 13, 14, 2, 15))
sorted <- t(apply(df, 1, function(x) sort(x)))
df[!duplicated(sorted), ]

Removing mutual reference rows in R dataframe i.e. when (a, b) value in a row exists as (b, a) in another row of the same dataframe

1 Answers1