2

I need to do some set operations using only data.table library.

I don't want to join/merge, but set operations. So we do not match column by key, but add rows from different data.tables.

We work on two samples from built-in data sets.

All of the problematic questions are below:

#install.packages("nycflights13")
library(nycflights13)

#install.packages("sqldf")
library(sqldf)

#install.packages("data.table")
library(data.table)

dt_airports <- data.table(airports)

Adt <- dt_airports[1:10]
Bdt <- dt_airports[6:15]

sqldf('SELECT * FROM A INTERSECT SELECT * FROM B')
# can I do it better using data.table?
rbindlist(list(Adt,Bdt))[duplicated(rbindlist(list(Adt,Bdt))),]

sqldf('SELECT * FROM A EXCEPT SELECT * FROM B')
# how to do it using data.table?

sqldf('SELECT * FROM B EXCEPT SELECT * FROM A')
# how to do it using data.table?
zx8754
  • 52,746
  • 12
  • 114
  • 209
AgnieszkaTomczyk
  • 253
  • 2
  • 12
  • Possible duplicate of [How to join (merge) data frames (inner, outer, left, right)?](http://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – ArunK Nov 07 '16 at 12:41
  • No. I don't want to do JOINING, but set operetions. So we do not match column by key, but add rows from different data.tables. – AgnieszkaTomczyk Nov 07 '16 at 12:52
  • 1
    You are merging two data.tables based on particular columns. The first line you've asked for is an intersect, the second is a left_join (I think) and the third is a right. – ArunK Nov 07 '16 at 12:57
  • 4
    Jan Gorecki added set operations to the package, available in the devel version under names like `fsetdiff`. See news, installation instructions and "getting started" tutorials on the official website http://r-datatable.com – Frank Nov 07 '16 at 13:01
  • Unfortunately, I can use basic data.table... – AgnieszkaTomczyk Nov 07 '16 at 13:09

2 Answers2

2

To elaborate on Frank's comment, data.table v1.9.8 and above has built-in rows set operation. I would prefer fsetdiff or fintersect solution as there is no need to set keys.

#first problem
fintersect(Adt,Bdt)

#second problem
fsetdiff(Adt,Bdt)

#third problem
fsetdiff(Bdt,Adt)

It was also claimed to be fast: https://github.com/Rdatatable/data.table/blob/master/NEWS.0.md

artidataio
  • 316
  • 4
  • 8
1

Using only data.table:

sqldf('SELECT * FROM A INTERSECT SELECT * FROM B')
# data.table
rbindlist(list(Adt,Bdt))[duplicated(rbindlist(list(Adt,Bdt))),]
# better in data.table
setkeyv(Adt,colnames(Adt)[1:7])
Adt[Bdt, nomatch=0L]

sqldf('SELECT * FROM A EXCEPT SELECT * FROM B')
# data.table
setkeyv(Adt,colnames(Adt)[1:7])
Adt[!Bdt]

sqldf('SELECT * FROM B EXCEPT SELECT * FROM A')
# data.table
setkeyv(Bdt,colnames(Bdt)[1:7])
Bdt[!Adt]
AgnieszkaTomczyk
  • 253
  • 2
  • 12