-1

I have two columns of data (say id and master_id) in R. It should be the case that all the values in id should be present in master_id. But, I suspect that is not the case and I want to identify which ones are the erroneous values. I cannot just inspect the data as I am dealing with data of the order of 100k.

How do I go about finding the erroneous values?

vad
  • 661
  • 1
  • 5
  • 12
  • If you are looking for different values in two columns you can use `setdiff(id, master_id)`. It will return the values of `id` which are not in `master_id` – DrDom Jun 11 '13 at 20:10
  • 1
    This appears to be a duplicate of [one of the top R questions](http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right/1300618#1300618). See also: ?merge – Jack Ryan Jun 11 '13 at 21:17
  • See comment above. Also the question does not include [a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Jack Ryan Jun 12 '13 at 04:58
  • Also the question does not include what you have already tried. Also the question does not include what you have already searched. – Jack Ryan Jun 12 '13 at 05:12

2 Answers2

1

the %in% function may come in handy. It will throw an FALSE for those cases that are in the first but not the second set

E.g.

DF$master_id %in% DF$id 

id is the subset of master_id, so master_id values without a counterpart will get a FALSE

or, to see how it works run (from R help file)

1:10 %in% c(1,3,5,9)
SprengMeister
  • 550
  • 1
  • 4
  • 12
0

Here's an answer from 2 days ago:

library(data.table)
DF1<-data.frame(x=1:3,y=4:6,t=10:12)
DF2<-data.frame(x=3:5,y=6:8,s=1:3)
library(data.table)
DF1 <- data.table(DF1, key = c("x", "y"))
DF2 <- data.table(DF2, key = c("x", "y"))
DF1[!DF2] # maybe you want this?
DF2[!DF1] # or maybe you want this?
Community
  • 1
  • 1
Jack Ryan
  • 2,134
  • 18
  • 26