I would like to compare two data sets and identify specific instances of discrepancies between them (i.e., which variables were different).
While I have found out how to identify which records are not identical between the two data sets (using the function detailed here: http://www.cookbook-r.com/Manipulating_data/Comparing_data_frames/), I'm not sure how to flag which variables are different.
E.g.
Data set A:
id name dob vaccinedate vaccinename dose
100000 John Doe 1/1/2000 5/20/2012 MMR 4
100001 Jane Doe 7/3/2011 3/14/2013 VARICELLA 1
Data set B:
id name dob vaccinedate vaccinename dose
100000 John Doe 1/1/2000 5/20/2012 MMR 3
100001 Jane Doee 7/3/2011 3/24/2013 VARICELLA 1
100002 John Smith 2/5/2010 7/13/2013 HEPB 3
I want to identify which records are different, and which specific variable(s) have discrepancies. For example, the John Doe record has 1 discrepancy in dose
, and the Jane Doe record has 2 discrepancies: in name
and vaccinedate
. Also, data set B has one additional record that was not in data set A, and I would want to identify these instances as well.
In the end, the goal is to find the frequency of the "types" of errors, e.g. how many records have a discrepancy in vaccinedate, vaccinename, dose, etc.
Thanks!