0

My 2 test datasets

df1 <- structure(list(var1 = c(1, 2, 3), 
                      var2 = c("apple", "peach", "orange"), 
                      var3 = c("red","blue","green"),
                      var4 = c("2021-01-01", "2021-12-31", "2021-07-31")
                      ), 
                 row.names = c(NA,-3L), 
                 class = c("tbl_df", "tbl", "data.frame"))


df2 <- structure(list(var1 = c(1, 2, 3), 
                      var2 = c("apple", "peach", "orange"), 
                      var3 = c("red","purple","green"),
                      var4 = c("2021-01-01", "2021-12-24", "2021-07-31")
                      ), 
                 row.names = c(NA,-3L),
                 class = c("tbl_df", "tbl", "data.frame"))

Im looking to compare both tables based on var4 so the resulting table will contain the records which have a different var4 and are stored in df1. For this example above, record 2 has a different content in var4 so I would require to store this entire record from df1 on df_diff .

df_diff <- structure(list(var1 = c(2), 
                      var2 = c("peach"), 
                      var3 = c("blue"),
                      var4 = c("2021-12-31")
                      ), 
                 row.names = c(NA,-1L),
                 class = c("tbl_df", "tbl", "data.frame"))
Andres Mora
  • 1,040
  • 8
  • 16

1 Answers1

1

dplyr

You can use dplyr::anti_join.

anti_join(df1, df2, by="var4")
# A tibble: 1 x 4
   var1 var2  var3  var4      
  <dbl> <chr> <chr> <chr>     
1     2 peach blue  2021-12-24

base R

df1[!df2$var4 %in% df1$var4,]

data.table

setDT(df1)[!df2, on = "var4"]
Maël
  • 45,206
  • 3
  • 29
  • 67