31

In the data.table FAQ, the nomatch = NA parameter is said to be akin to an outer join. However, I haven't been able to get data.table to do a full outer join – only right outer joins.

For example:

a <- data.table("dog" = c(8:12), "cat" = c(15:19))

   dog cat
1:   8  15
2:   9  16
3:  10  17
4:  11  18
5:  12  19

b <- data.table("dog" = 1:10, "bullfrog" = 11:20)

    dog bullfrog
 1:   1       11
 2:   2       12
 3:   3       13
 4:   4       14
 5:   5       15
 6:   6       16
 7:   7       17
 8:   8       18
 9:   9       19
10:  10       20

setkey(a, dog)
setkey(b, dog)

a[b, nomatch = NA]

    dog cat bullfrog
 1:   1  NA       11
 2:   2  NA       12
 3:   3  NA       13
 4:   4  NA       14
 5:   5  NA       15
 6:   6  NA       16
 7:   7  NA       17
 8:   8  15       18
 9:   9  16       19
10:  10  17       20

So, nomatch = NA produces a right outer join (which is the default). What if I need a full join? For example:

merge(a, b, by = "dog", all = TRUE) 
# Or with plyr:
join(a, b, by = "dog", type = "full")

    dog cat bullfrog
 1:   1  NA       11
 2:   2  NA       12
 3:   3  NA       13
 4:   4  NA       14
 5:   5  NA       15
 6:   6  NA       16
 7:   7  NA       17
 8:   8  15       18
 9:   9  16       19
10:  10  17       20
11:  11  18       NA
12:  12  19       NA

Is that possible with data.table?

Paul Murray
  • 1,002
  • 1
  • 10
  • 24
  • For joins with data.table see last answer of [this post][1] [1]: http://stackoverflow.com/questions/14076065/data-table-inner-outer-join-with-na-in-join-column-of-type-double-bug?rq=1 – statquant Mar 03 '13 at 22:45
  • For all sorts of joins with data.table see last answer of [this post][1] [1]: http://stackoverflow.com/questions/14076065/data-table-inner-outer-join-with-na-in-join-column-of-type-double-bug?rq=1 – statquant Mar 03 '13 at 22:48

3 Answers3

37

You actually have it right there. Use merge.data.table which is exactly what you are doing when you call

merge(a, b, by = "dog", all = TRUE)

since a is a data.table, merge(a, b, ...) calls merge.data.table(a, b, ...)

thelatemail
  • 91,185
  • 12
  • 128
  • 188
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
13
x= data.table(a=1:5,b=11:15)
y= data.table(a=c(1:4,6),c=c(101:104,106))

setkey(x,a)
setkey(y,a)

unique_keys <- unique(c(x[,a], y[,a]))
y[x[.(unique_keys), on="a"]  ] # Full Outer Join
Ashrith Reddy
  • 1,022
  • 1
  • 13
  • 26
  • Brilliant! To make it easier to understand, I'd like to break the last two lines to three: 1) `unique_keys <- data.table(a = unique(c(x[,a], y[,a])))` 2) `xx <- x[unique_keys]` 3) `y[xx]` – Feng Jiang Sep 19 '19 at 12:59
0

A different approach to getting a full join would be to read the full join as a right join plus an anti join:

rbind(
  fill = TRUE,
  a[b, on = ...],
  a[!b, on = ...]
)
Lazy
  • 123
  • 5