1

Input Data Frame

DF 1 (example - nrow = 10)

Col A | Col B | Col C
  a       1       2    
  a       3       4    
  b       5       6    
  c       9      10    

DF 2 (example - nrow = 20)

Col A | Col B | Col E
  a       1       22    
  a       31      41    
  a       3       63    
  b       5       6
  b       11      13   
  c       9       20 

I want to create a third data set which contains each of the additional row found in the Data Frame 2 for the Col A and Col B entry.

Output File (nrow = 20-10 = 10)

Col A | Col B | Col E
  a       31      41    
  b       11      13 
Aashay Mehta
  • 118
  • 8

2 Answers2

5
library(dplyr)
anti_join(df2, df1, by = c("ColA", "ColB"))
#   ColA ColB ColE
# 1    a   31   41
# 2    b   11   13

Data:

df1 <- structure(list(ColA = c("a", "a", "b", "c"), ColB = c(1L, 3L, 
5L, 9L), ColC = c(2L, 4L, 6L, 10L)), class = "data.frame", row.names = c(NA, 
-4L))
df2 <- structure(list(ColA = c("a", "a", "a", "b", "b", "c"), ColB = c(1L, 
31L, 3L, 5L, 11L, 9L), ColE = c(22L, 41L, 63L, 6L, 13L, 20L)), class = "data.frame", row.names = c(NA, 
-6L))
r2evans
  • 141,215
  • 6
  • 77
  • 149
1

We can use

library(data.table)
setDT(df2)[!df1, on = .(ColA, ColB)]
#  ColA ColB ColE
#1:    a   31   41
#2:    b   11   13

data

df1 <- structure(list(ColA = c("a", "a", "b", "c"), ColB = c(1L, 3L, 
5L, 9L), ColC = c(2L, 4L, 6L, 10L)), class = "data.frame", row.names = c(NA, 
-4L))
df2 <- structure(list(ColA = c("a", "a", "a", "b", "b", "c"), ColB = c(1L, 
31L, 3L, 5L, 11L, 9L), ColE = c(22L, 41L, 63L, 6L, 13L, 20L)), class = "data.frame", row.names = c(NA, 
-6L))
akrun
  • 874,273
  • 37
  • 540
  • 662