Edit 2019: This question was asked prior to changes in data.table
in November 2016, see the accepted answer below for both the current and previous methods.
I have a data.table
table with about 2.5 million rows. There are two columns. I want to remove any rows that are duplicated in both columns. Previously for a data.frame I would have done this:
df -> unique(df[,c('V1', 'V2')])
but this doesn't work with data.table. I have tried unique(df[,c(V1,V2), with=FALSE])
but it seems to still only operate on the key of the data.table and not the whole row.
Any suggestions?
Cheers, Davy
Example
>dt
V1 V2
[1,] A B
[2,] A C
[3,] A D
[4,] A B
[5,] B A
[6,] C D
[7,] C D
[8,] E F
[9,] G G
[10,] A B
in the above data.table where V2
is the table key, only rows 4,7, and 10 would be removed.
dt <- data.table::data.table(
V1 = c("B", "A", "A", "A", "A", "A", "C", "C", "E", "G"),
V2 = c("A", "B", "B", "B", "C", "D", "D", "D", "F", "G"),
)