1

I have 2 data frames (A and B) of the following structure:

A:

projectID    offerID
   20          12
   20          17 
   32          12
   32          25

B:

 projectID    offerID
   20          12
   20          17 
   32          12

and I'd like to check for pairs that are in A but not in B. So in my example, I'd like to get new df which contains the pairs that are in A but not in B:

projectID    offerID
   32           25

I tried some options; for example:

APairs <- A %>% group_by(projectID, offerID)
BPairs <- B %>% group_by(projectID, offerID)

!(APairs %in% BPairs)

but I'm getting True/False result, which I can't really understand/verify against my data.

Your help will be appreciated!

staove7
  • 560
  • 5
  • 18

3 Answers3

4

In base R:

#define the key columns in the case of different structure between A and B
cols<-c("projectID","offerID")
A[!do.call(paste,A[cols]) %in% do.call(paste,B[cols]),]
#  projectID offerID
#4        32      25
nicola
  • 24,005
  • 3
  • 35
  • 56
3
library(data.table)
setkey(setDT(A))
setkey(setDT(B))
A[!B]                # A[B] is similar to merge() so perform the opposite using !
#   projectID offerID
#1:        32      25

#incase there are extra columns in any of the table, the specify the common columns in a vector
common.col <- c("projectID", "offerID")
setkeyv(setDT(A), cols = common.col)
setkeyv(setDT(B), cols = common.col)
A[!B]
joel.wilson
  • 8,243
  • 5
  • 28
  • 48
2

We can use anti_join from dplyr

 library(dplyr)
 anti_join(A, B)
 #    projectID offerID
 #1        32      25

If there are more number of columns, specify the by option

 anti_join(A, B, by = c("projectID", "offerID"))
 #    projectID offerID
 #1        32      25
akrun
  • 874,273
  • 37
  • 540
  • 662