identify observations based on 2 elements in 2 dataframes that do not match

Question

I want to identify observations in 1 df that do not match that of another df using 2 indicators (id and date). Below is sample df1 and df2.

df1
id       date          n
12-40   12/22/2018     3
11-08   10/02/2016     11

df2
id       date          interval
12-40   12/22/2018     3
11-08   10/02/2016     32
22-22   11/10/2015     11

I want a df that outputs rows that are in df2, but not in df1, like so. Note that row 3 (based on id and date) of df2 is not in df1.

df3
id       date          interval
22-22   11/10/2015     11

I tried doing this in tidyverse and was not able to get the code to work. Does anyone have suggestions on how to do this?

Duck · Answer 1 · 2020-12-07T22:38:20.107

Try this (Both options are base R, follow OP directions and do not require any package):

#Code1
df3 <- df2[!paste(df2$id,df1$date) %in% paste(df1$id,df2$date),]

Output:

     id       date interval
3 22-22 11/10/2015       11

It can also be considered:

#Code 2
df3 <- subset(df2,!paste(id,date) %in% paste(df1$id,df1$date))

Output:

     id       date interval
3 22-22 11/10/2015       11

Some data used:

#Data1
df1 <- structure(list(id = c("12-40", "11-08"), date = c("12/22/2018", 
"10/02/2016"), n = c(3L, 11L)), class = "data.frame", row.names = c(NA, 
-2L))

#Data2
df2 <- structure(list(id = c("12-40", "11-08", "22-22"), date = c("12/22/2018", 
"10/02/2016", "11/10/2015"), interval = c(3L, 32L, 11L)), class = "data.frame", row.names = c(NA, 
-3L))

akrun · Accepted Answer · 2020-12-07T22:41:46.700

We can use anti_join from tidyverse (as the OP mentioned about working with tidyverse). Here we use both 'id' and 'date' as mentioned in the OP's post. More complex joins can be done with tidyverse

library(dplyr)
anti_join(df2, df1, by = c('id', 'date'))
#     id       date interval
#1 22-22 11/10/2015       11

Or a similar option with data.table and it should be very efficient

library(data.table)
setDT(df2)[!df1, on = .(id, date)]
#      id       date interval
#1: 22-22 11/10/2015       11

data

df1 <- structure(list(id = c("12-40", "11-08"), date = c("12/22/2018", 
"10/02/2016"), n = c(3L, 11L)), class = "data.frame", row.names = c(NA, 
-2L))

df2 <- structure(list(id = c("12-40", "11-08", "22-22"), date = c("12/22/2018", 
"10/02/2016", "11/10/2015"), interval = c(3L, 32L, 11L)), class = "data.frame",
row.names = c(NA, 
-3L))

score 1 · Answer 3 · answered Dec 07 '20 at 23:08

1

Another base R option using merge + subset + complete.cases

df3 <- subset(
  u <- merge(df1, df2, by = c("id", "date"), all.y = TRUE),
  !complete.cases(u)
)[names(df2)]

which gives

> df3
     id       date interval
3 22-22 11/10/2015       11

answered Dec 07 '20 at 23:08

ThomasIsCoding

96,636
9
24
81

identify observations based on 2 elements in 2 dataframes that do not match

3 Answers3

data