The data.table
package provides many of the same table handling methods as SQL. If a table has a key, that key consists of one or more columns. But a table can't have more than one key, because it can't be sorted in two different ways at the same time.
In this example, X
and Y
are data.table
s with a single key column "id"; Y
also has a non-key column "x_id".
X <- data.table(id = 1:5, a=4:8,key="id")
Y <- data.table(id = c(1,1, 3,5,7), x_id=c(1,4:1), key="id")
The following syntax would join the tables on their keys:
X[Y]
How can I translate the following SQL syntax to data.table code?
select * from X join Y on X.id = Y.x_id;
The closest that I have gotten is:
Y[X,list(id, x_id),by = x_id,nomatch=0]
However, this does not do the same inner join as the SQL statement.
Here is a more clear example in which the foreign key is y_id, and we want the join to look up values of Y2 where X2$y_id = Y2$id
.
X2 <- data.table(id = 1:5, y_id = c(1,1,2,2,2), key="id")
Y2 <- data.table(id = 1:5, b = letters[1:5], key="id")
I would like to produce the table:
id y_id b
1 1 "a"
2 1 "a"
3 2 "b"
4 2 "b"
5 2 "b"
similar to what is done by the following kludge:
> merge(data.frame(X2), data.frame(Y2), by.x = "y_id", by.y = "id")
y_id id b
1 1 1 a
2 1 2 a
3 2 3 b
4 2 4 b
5 2 5 b
However, when I do this:
X2[Y2, 1:2,by = y_id]
I do not get the desired result:
y_id V1
[1,] 1 1
[2,] 1 2
[3,] 2 1
[4,] 2 2