I can get duplicated rows in R
on a data.table
dt
using
dt[duplicated(dt, by=someColumns)]
However, I would like to get pairs of duplicated rows and the "non-duplicates", for example consider dt
:
col1, col2, col3
A B C1
A B C2
A B1 C1
Now, dt[duplicated(dt, by=c('col1', "col2"))
would give me something along the lines of
col1, col2, col3
A B C2
I would like to get this together with the row that it did not chose to be duplicated, that is
col1, col2, col3
A B C1
A B C2
Speed comparison of answers:
> system.time(dt[duplicated(dt2, by = t) | duplicated(dt, by = t, fromLast = TRUE)])
user system elapsed
0.008 0.000 0.009
> system.time(dt[, .SD[.N > 1], by = t])
user system elapsed
77.555 0.100 77.703