Given a data.table
in R, I want to find rows that are the reversed version of a previous row. For example:
>head(DT)
V1 V2
1 nameA nameB
2 nameA nameC
3 nameB nameA
4 nameB nameF
5 nameN nameP
6 nameP nameN
In the case of row 1
, the code should return row 3
. In the case of row 5
, the code should return row 6
. Eventually, I want to drop the "reversed" rows.
The real dataset has 0.5 million rows and 2 columns. At the moment I am using this piece of code, which does the job:
require(foreach)
require(doMC)
registerDoMC(4)
rm.idx <- c()
rm.idx <- foreach(i=1:nrow(DT), .combine = 'c')%dopar%{
if (!(i %in% rm.idx)) which(DT[i,1] == DT[,2] & DT[i,2] == DT[,1])
}
The code "returns" a vector (rm.idx
) that contains the indexes to those rows that are the reversed version of a previous row.
However, it takes a long time (more than 30min) for the relatively "small" size of the data set. I often find that R has some tweak or some function that does the trick much faster (or, also, that my code is not very efficient). Therefore, I am wondering if anyone knows a faster way of finding rows that are the reversed of a previous row.
Thanks in advance for your time.