1

Here is an example of what I am trying to achieve. Consider the following data.table -

library(data.table)
x <- data.table(
  Org = c('BNE', 'SIN', 'ADL', 'SIN', 'SYD', 'MEL', 'BNE'), 
  Dest = c('ADL', 'MEL', 'SYD', 'BNE', 'ADL', 'SIN', 'ADL')
)
> x
   Org Dest
1: BNE  ADL
2: SIN  MEL
3: ADL  SYD
4: SIN  BNE
5: SYD  ADL
6: MEL  SIN
7: BNE  ADL

I am trying to get to the following table -

   Org Dest
1: BNE  ADL
2: SIN  MEL
3: ADL  SYD
4: SIN  BNE

In x, for my use case, row 1 and 7, row 2 and 6, and row 3 and 5 are duplicates. i.e. the order of the columns within the de-duplicating columns does not matter and MEL-SIN is the same SIN-MEL.

There is a similar question here, but that questions considers matrices which makes the use of apply functions possible, which may not be economical given that I have a large data.table of ~35M rows.

What might be the fastest way to go from x to the bottom data.table?

Ameya
  • 1,712
  • 1
  • 14
  • 29
  • 1
    Some useful links - https://stackoverflow.com/questions/46943753/r-data-table-duplicate-rows-with-a-pair-of-columns/46944062 or https://stackoverflow.com/questions/25145982/extract-unique-rows-from-a-data-table-with-each-row-unsorted/25151395#25151395 – thelatemail Dec 03 '17 at 23:34

0 Answers0