2

I have the following data.tables

library(data.table)


 dt_1 <- data.table(id = c('cg','fs','fs'),
                 v1 = c('a','a','b'),
                 v2 = c('d','e','f'))

dt_2 <- data.table(id = c('cg','fs','cg'),
                   v1 = c('a','a','b'))

And I would like to filter dt_1 based on dt_2. In the end I would like to end up with

   id v1 v2
1: fs  b  f

So filter out the exact rows of dt_1 that are contained in dt_2

This operation

dt_1[!(id%in%dt_2$id & v1%in%dt_2$v1)]

Does not work because it takes also the inner combinations of dt_2$id and dt_2$v1 and this

dt_1[!dt_2] throws an error.

Any ideas ?

zx8754
  • 52,746
  • 12
  • 114
  • 209
quant
  • 4,062
  • 5
  • 29
  • 70

2 Answers2

6

You need to specify on what you're joining:

dt_1[!dt_2, on=names(dt_2)] # "easier" than intersect(names(dt_1), names(dt_2)) because all dt_2 variables are in dt_1 but in a more general context the latter will be preferable
#   id v1 v2
#1: fs  b  f
Cath
  • 23,906
  • 5
  • 52
  • 86
4

We could use anti_join

library(dplyr)
anti_join(dt_1, dt_2)
#  id v1 v2
#1 fs  b  f

Note that if the column names are the same, we could use data.table::fsetdiff

fsetdiff(dt_1, dt_2)
akrun
  • 874,273
  • 37
  • 540
  • 662