1

I want to identify observations in 1 df that do not match that of another df using 2 indicators (id and date). Below is sample df1 and df2.

df1
id       date          n
12-40   12/22/2018     3
11-08   10/02/2016     11
df2
id       date          interval
12-40   12/22/2018     3
11-08   10/02/2016     32
22-22   11/10/2015     11

I want a df that outputs rows that are in df2, but not in df1, like so. Note that row 3 (based on id and date) of df2 is not in df1.

df3
id       date          interval
22-22   11/10/2015     11

I tried doing this in tidyverse and was not able to get the code to work. Does anyone have suggestions on how to do this?

D. Fowler
  • 601
  • 3
  • 7

3 Answers3

1

Try this (Both options are base R, follow OP directions and do not require any package):

#Code1
df3 <- df2[!paste(df2$id,df1$date) %in% paste(df1$id,df2$date),]

Output:

     id       date interval
3 22-22 11/10/2015       11

It can also be considered:

#Code 2
df3 <- subset(df2,!paste(id,date) %in% paste(df1$id,df1$date))

Output:

     id       date interval
3 22-22 11/10/2015       11

Some data used:

#Data1
df1 <- structure(list(id = c("12-40", "11-08"), date = c("12/22/2018", 
"10/02/2016"), n = c(3L, 11L)), class = "data.frame", row.names = c(NA, 
-2L))

#Data2
df2 <- structure(list(id = c("12-40", "11-08", "22-22"), date = c("12/22/2018", 
"10/02/2016", "11/10/2015"), interval = c(3L, 32L, 11L)), class = "data.frame", row.names = c(NA, 
-3L))
Duck
  • 39,058
  • 13
  • 42
  • 84
1

We can use anti_join from tidyverse (as the OP mentioned about working with tidyverse). Here we use both 'id' and 'date' as mentioned in the OP's post. More complex joins can be done with tidyverse

library(dplyr)
anti_join(df2, df1, by = c('id', 'date'))
#     id       date interval
#1 22-22 11/10/2015       11

Or a similar option with data.table and it should be very efficient

library(data.table)
setDT(df2)[!df1, on = .(id, date)]
#      id       date interval
#1: 22-22 11/10/2015       11

data

df1 <- structure(list(id = c("12-40", "11-08"), date = c("12/22/2018", 
"10/02/2016"), n = c(3L, 11L)), class = "data.frame", row.names = c(NA, 
-2L))

df2 <- structure(list(id = c("12-40", "11-08", "22-22"), date = c("12/22/2018", 
"10/02/2016", "11/10/2015"), interval = c(3L, 32L, 11L)), class = "data.frame",
row.names = c(NA, 
-3L))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Another base R option using merge + subset + complete.cases

df3 <- subset(
  u <- merge(df1, df2, by = c("id", "date"), all.y = TRUE),
  !complete.cases(u)
)[names(df2)]

which gives

> df3
     id       date interval
3 22-22 11/10/2015       11
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81