Merging data.tables using `on` fails

Question

I have two dataframes of different dimensions, call them df1 and df2. They both have columns of integers called Product and Output.

df2 has the unique column of numbers called ActualOne. I want to append all the values of ActualOne from df2 to df1, where the Product and Output values of df1 are jointly matched with those in df2. For Product,Output pairs not found in df2, ActualOne is populated by NAs.

Attempt #1: Use data.table

library(data.table)
setDT(df1)
setDT(df2)

df1[df2, ActualOne := i.ActualOne, on=c("Product", "Output")]

I am by no means an expert at data.table. I didn't think anything had to be reformatted and thought the temporary storing as a data.table was better.

I received the following error:

Error in [.data.table(df1, df2, :=(ActualOne, i.ActualOne), : unused argument (on = c("Product", "Output"))

Attempt #2: Use a matching function

Off the top of my head something along the lines of:

df1$ActualOne <- df2[match(df1$Product,df2$Product) & match(df1$Output,df2$Output,"ActualOne"]

This results in the ActualOne column of df1 being populated by "ActualOne"

Thanks for your help.

Which version of `data.table` do you have? The `on` feature works from *v1.9.6* or higher. — Jaap, Oct 19 '15 at 18:49
Moreover: always try to give a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). — Jaap, Oct 19 '15 at 18:50

David Arenburg · Answer 1 · 2015-10-19T19:46:54.100

1

You have an older version of data.table. If you want your code to work you need to update to v1.9.6+ - Please check out the updates here.

For older versions, the idiomatic binary join + update would be to first key both data.tables and then join/update. Something like (we don't have actual data to test):

setkey(df1, Product, Output)
setkey(df2, Product, Output)
df1[df2, ActualOne := i.ActualOne]

edited Oct 19 '15 at 19:46

answered Oct 19 '15 at 19:42

David Arenburg

91,361
17
137
196

score 0 · Answer 2 · answered Oct 19 '15 at 19:27

0

Here is one way to do it:

actualTF <- df1$Product == df2$Product & df1$Output == df2$Output
df1$ActualOne <- sapply(1:50, function(x) if (actualTF[x]) return(df2$ActualOne[x]) else NA)

answered Oct 19 '15 at 19:27

smanchan

1
1

DunderChief · Answer 3 · 2015-10-19T20:31:49.383

0

You want the merge() function I think. Here's an example per your description:

library(data.table)
df1 <- data.table(Product=1:10, Output=101:110)
df2 <- data.table(Product=1:3, Output=101:103, ActualOne=c('A', 'B', 'C'))
merge(df1, df2, by=c('Product', 'Output'), all.x=TRUE)

edited Oct 19 '15 at 20:31

answered Oct 19 '15 at 19:33

DunderChief

726
4
7

Merging data.tables using `on` fails

3 Answers3