Here is an example of what I am trying to achieve. Consider the following data.table
-
library(data.table)
x <- data.table(
Org = c('BNE', 'SIN', 'ADL', 'SIN', 'SYD', 'MEL', 'BNE'),
Dest = c('ADL', 'MEL', 'SYD', 'BNE', 'ADL', 'SIN', 'ADL')
)
> x
Org Dest
1: BNE ADL
2: SIN MEL
3: ADL SYD
4: SIN BNE
5: SYD ADL
6: MEL SIN
7: BNE ADL
I am trying to get to the following table -
Org Dest
1: BNE ADL
2: SIN MEL
3: ADL SYD
4: SIN BNE
In x
, for my use case, row 1 and 7, row 2 and 6, and row 3 and 5 are duplicates. i.e. the order of the columns within the de-duplicating columns does not matter and MEL-SIN is the same SIN-MEL.
There is a similar question here, but that questions considers matrices which makes the use of apply
functions possible, which may not be economical given that I have a large data.table
of ~35M rows.
What might be the fastest way to go from x
to the bottom data.table
?