I have a data frame in R that contains the gene ids of paralogous genes in Arabidopsis, looking something like this:
gene_x gene_y
AT1 AT2
AT3 AT4
AT1 AT2
AT1 AT3
AT2 AT1
with the 'ATx' corresponding to the gene names.
Now, for downstream analysis, I would want to continue only with the unique pairs. Some pairs are just simple duplicates and can be removed easily upon using the duplicated()
function.
However, the fifth row in the artificial data frame above is also a duplicate, but in reversed order, and which will not be picked up by the duplicated()
, nor by the unique()
function.
Any ideas in how to remove these rows?