data.table is very useful but I could not find an elegant way to solve the following problem. There are some closer answers out there, but none solved my problem. Lets say the below is the data.table object and I want to filter duplicate rows based on the gene pairs (Gene1 and Gene2) but in both ways.
Gene1 Gene2 Ens.ID.1 Ens.ID.2 CORR
1: FOXA1 MYC ENSG000000129.13. ENSG000000129.11 0.9953311
2: EGFR CD4 ENSG000000129 ENSG000000129.12 0.9947215
3: CD4 EGFR ENSG000000129.12 ENSG000000129.11 0.9940735
4: EGFR CD4 ENSG000000129 ENSG000000129.12 0.9947215
If there are such duplicates with respect to Gene1 and Gene2, then I want to get this:
Gene1 Gene2 Ens.ID.1 Ens.ID.2 CORR
1: FOXA1 MYC ENSG000000129.13. ENSG000000129.11 0.9953311
2: EGFR CD4 ENSG000000129 ENSG000000129.12 0.9947215
It is very slow with standard coding over millions of rows. Is there an elegant and fast way of doing this in data.table?