I am working with different data sets in R. Each data set will have upto 16 columns and 1000s of records. I am trying to find a way to compare two data sets at a time so that I can find the records that were deleted/Updated/Added. I will be using the ID column and color column to identify the variances. Below is a small set example (have not included all the columns):
df1 <- data.frame(ID = letters[1:5], color = c("blue", "white", "red", "green", "blue"))
df2 <- data.frame(ID = c("a","c","d","d"), color = c("blue", "yellow", "green", "blue"))
ID will be the common factor between the datasets
.
I need to compare the datasets
to get three different set of values:
New Records: records that appear in df1 but not in df2. So I should get:
ID Color
b white
c red
e blue
Deleted records: records that DO NOT appear in df1 but do appear in df2
ID Color
c yellow
d blue
Updated records This is the most important one I need. Basically anything with the same ID that has a different color:
ID df1color df2color
c red yellow
I have tried using the joins from dplyr
package....but was not successful. Is there a way I can do this in R
.