I have a data set where two columns contain values that are reciprocal. That is if you could flip some of the values in the row only one of them the rows would be identical. I'm wondering if there is a way to filter such rows, keeping only one.
ds <- structure(list(gene_a = c("CACNA2D4", "CTNND2", "GCN1L1", "ROBO2",
"MLL2", "ZNF521", "ITPR3", "STAB1", "DSP", "ZNF676", "LAMC1",
"NLRP2", "PCDHGA10", "PRDM16", "PTPRB", "PXDN", "CTNND2", "FBN3",
"KIF20B", "MYOF"), gene_a_freq = c(0.0303030303030303, 0.0303030303030303,
0.0656565656565657, 0.0454545454545455, 0.0555555555555556, 0.0353535353535354,
0.0404040404040404, 0.0353535353535354, 0.0303030303030303, 0.0353535353535354,
0.0303030303030303, 0.0404040404040404, 0.0303030303030303, 0.0303030303030303,
0.0303030303030303, 0.0303030303030303, 0.0303030303030303, 0.0353535353535354,
0.0303030303030303, 0.0353535353535354), gene_b = c("CTNND2",
"CACNA2D4", "ROBO2", "GCN1L1", "ZNF521", "MLL2", "STAB1", "ITPR3",
"ZNF676", "DSP", "PTPRB", "PRDM16", "PXDN", "NLRP2", "LAMC1",
"PCDHGA10", "FBN3", "CTNND2", "MYOF", "KIF20B"), gene_b_freq = c(0.0303030303030303,
0.0303030303030303, 0.0454545454545455, 0.0656565656565657, 0.0353535353535354,
0.0555555555555556, 0.0353535353535354, 0.0404040404040404, 0.0353535353535354,
0.0303030303030303, 0.0303030303030303, 0.0303030303030303, 0.0303030303030303,
0.0404040404040404, 0.0303030303030303, 0.0303030303030303, 0.0353535353535354,
0.0303030303030303, 0.0353535353535354, 0.0303030303030303)), .Names = c("gene_a",
"gene_a_freq", "gene_b", "gene_b_freq"), row.names = c(NA, 20L
), class = "data.frame")
For example below, in row 2 if you swapped gene_a
with gene_b
and gene_a_freq
with gene_b_freq
the row 2 would be the same as row 1. The cases aren't always in adjacent rows. I'd like to be able to only keep one of the two, so in this example drop row 2 keeping row 1.
gene_a gene_a_freq gene_b gene_b_freq
1 CACNA2D4 0.03030303 CTNND2 0.03030303
2 CTNND2 0.03030303 CACNA2D4 0.03030303
3 GCN1L1 0.06565657 ROBO2 0.04545455
4 ROBO2 0.04545455 GCN1L1 0.06565657
5 MLL2 0.05555556 ZNF521 0.03535354
6 ZNF521 0.03535354 MLL2 0.05555556
7 ITPR3 0.04040404 STAB1 0.03535354
8 STAB1 0.03535354 ITPR3 0.04040404
9 DSP 0.03030303 ZNF676 0.03535354
10 ZNF676 0.03535354 DSP 0.03030303
11 LAMC1 0.03030303 PTPRB 0.03030303
12 NLRP2 0.04040404 PRDM16 0.03030303
13 PCDHGA10 0.03030303 PXDN 0.03030303
14 PRDM16 0.03030303 NLRP2 0.04040404
15 PTPRB 0.03030303 LAMC1 0.03030303
16 PXDN 0.03030303 PCDHGA10 0.03030303
17 CTNND2 0.03030303 FBN3 0.03535354
18 FBN3 0.03535354 CTNND2 0.03030303
19 KIF20B 0.03030303 MYOF 0.03535354
20 MYOF 0.03535354 KIF20B 0.03030303
Thanks