Yet another way to insert the missing columns (with the correct type and NAs) is to merge()
the first data.table A
with an empty data.table A2[0]
which has the structure of the second data.table. This saves the possibility to introduce bugs in user functions (I know merge()
is more reliable than my own code ;)). Using mnel's tables from above, do something like the code below.
Also, using rbindlist()
should be much faster when dealing with data.tables
.
Define the tables (same as mnel's code above):
library(data.table)
A <- data.table(a=1:3, b=1:3, c=1:3)
A2 <- data.table(a=6:9, b=6:9, c=6:9)
B <- data.table(b=1:3, c=1:3, d=1:3, m=LETTERS[1:3])
C <- data.table(n=round(rnorm(3), 2), f=c(T, F, T), c=7:9)
Insert the missing variables in table A: (note the use of A2[0]
A <- merge(x=A, y=A2[0], by=intersect(names(A),names(A2)), all=TRUE)
Insert the missing columns in table A2:
A2 <- merge(x=A[0], y=A2, by=intersect(names(A),names(A2)), all=TRUE)
Now A
and A2
should have the same columns, with the same types. Set the column order to match, just in case (possibly not needed, not sure if rbindlist()
binds across column names or column positions):
setcolorder(A2, names(A))
DT.ALL <- rbindlist(l=list(A,A2))
DT.ALL
Repeat for the other tables... Maybe it would be better to put this into a function rather than repeat by hand...
DT.ALL <- merge(x=DT.ALL, y=B[0], by=intersect(names(DT.ALL), names(B)), all=TRUE)
B <- merge(x=DT.ALL[0], y=B, by=intersect(names(DT.ALL), names(B)), all=TRUE)
setcolorder(B, names(DT.ALL))
DT.ALL <- rbindlist(l=list(DT.ALL, B))
DT.ALL <- merge(x=DT.ALL, y=C[0], by=intersect(names(DT.ALL), names(C)), all=TRUE)
C <- merge(x=DT.ALL[0], y=C, by=intersect(names(DT.ALL), names(C)), all=TRUE)
setcolorder(C, names(DT.ALL))
DT.ALL <- rbindlist(l=list(DT.ALL, C))
DT.ALL
The result looks the same as mnels' output (except for the random numbers and the column order).
PS1: The original author does not say what to do if there are matching variables -- do we really want to do a rbind()
or are we thinking of a merge()
?
PS2: (Since I do not have enough reputation to comment) The gist of the question seems a duplicate of this question. Also important for the benchmarking of data.table
vs. plyr
with large datasets.