Filter a datatable based on another datatable

Question

I have the following data.tables

library(data.table)


 dt_1 <- data.table(id = c('cg','fs','fs'),
                 v1 = c('a','a','b'),
                 v2 = c('d','e','f'))

dt_2 <- data.table(id = c('cg','fs','cg'),
                   v1 = c('a','a','b'))

And I would like to filter dt_1 based on dt_2. In the end I would like to end up with

   id v1 v2
1: fs  b  f

So filter out the exact rows of dt_1 that are contained in dt_2

This operation

dt_1[!(id%in%dt_2$id & v1%in%dt_2$v1)]

Does not work because it takes also the inner combinations of dt_2$id and dt_2$v1 and this

dt_1[!dt_2] throws an error.

Any ideas ?

Cath · Accepted Answer · 2018-05-15T13:48:29.650

6

You need to specify on what you're joining:

dt_1[!dt_2, on=names(dt_2)] # "easier" than intersect(names(dt_1), names(dt_2)) because all dt_2 variables are in dt_1 but in a more general context the latter will be preferable
#   id v1 v2
#1: fs  b  f

edited May 15 '18 at 13:48

answered May 15 '18 at 13:41

Cath

23,906
5
52
86

akrun · Answer 2 · 2018-05-15T13:49:22.110

4

We could use anti_join

library(dplyr)
anti_join(dt_1, dt_2)
#  id v1 v2
#1 fs  b  f

Note that if the column names are the same, we could use data.table::fsetdiff

fsetdiff(dt_1, dt_2)

edited May 15 '18 at 13:49

answered May 15 '18 at 13:40

akrun

874,273
37
540
662

Filter a datatable based on another datatable

2 Answers2