I have two data frames: raw2
which has 28,406 records and raw3
26,421 records.
The records in raw3
are a subset of those in raw2
. In fact raw3
was derived using:
raw3<-setDT(raw2)[order(O_ID, Program_forsorting), head(.SD, 1), .(O_ID)]
I now have a setdiff
function where I'm trying to pull the records that did not get carried over from raw2
to raw3
using:
settdiff(raw2,raw3)
The results should have 1,985 records. However, the results have 28,406 which represents raw2
. If I switch the formula around to settdiff(raw3,raw2)
the results contains 26,421 records.
What am I doing wrong?
Here is sample data
raw2<-as.data.frame(cbind("col1"=c("a","h","b","f","g"),"O_ID"=c(1,1,1,4,5), "Program_forsorting"=c("p1","p2","p2","p3","p1")))