As a simplified example of my problem, say I have four data.tables
dt1
, ..., dt4
, all with the same structure:
head(dt1)
date x y
1: 2000-10-01 0.4527087 -0.11590788
2: 2001-10-01 0.7200252 -0.55722270
3: 2002-10-01 -1.3804472 -1.47030087
4: 2003-10-01 -0.1380225 2.34157766
5: 2004-10-01 -0.9288675 -1.32993998
6: 2005-10-01 -0.9592633 0.76316150
That is, they all have three columns called date
, x
and y
. My desired output is a merged data.table
(on date) with five columns: date
, and then the x
column from each individual table renamed to reflect its original data.table
:
head(desired_output)
date x_dt1 x_dt2 x_dt3 x_dt4
1: 2000-10-01 0.4527087 -0.11590788 1.1581946 -1.5159040
2: 2001-10-01 0.7200252 -0.55722270 -1.6247254 -0.3325556
3: 2002-10-01 -1.3804472 -1.47030087 -0.9766309 -0.2368857
4: 2003-10-01 -0.1380225 2.34157766 1.1831091 -0.4399184
5: 2004-10-01 -0.9288675 -1.32993998 0.8716144 -0.4086229
6: 2005-10-01 -0.9592633 0.76316150 -0.8860816 -0.4299365
I assume this can be done using the suffixes
argument of merge.data.table
somehow. I have tried to modify mergeDTs
from this answer without success as yet. A solution that successfully modifies mergeDTs
(or just using a function that could be applied to a list of of several data.tables
) would be excellent.
I am aware of this very slick dplyr/purrr answer but would prefer a data.table
solution.
Example data
library(data.table)
dt1 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
dt2 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
dt3 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
dt4 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
Solution
Below I have put B. Christian Kamgang's answer into functional form (to make it easily adaptable to my real problem) and removed dependency on the new pipe (since my organisation hasn't yet upgraded):
merge_select <- function(on, vars, ..., suffix = "_") {
dts <- list(...)
names(dts) <- sapply(as.list(substitute(list(...)))[-1L], deparse)
nv <- length(vars)
ndt <- length(dts)
old_cols <- split(rep(vars, ndt),
ceiling(seq_along(rep(vars, ndt))/nv))
new_cols <- split(paste0(vars, suffix, rep(names(dts), each = nv)),
ceiling(seq_along(paste0(vars,
suffix,
rep(names(dts), each = nv)))/nv))
sep_cols <- lapply(dts, function(x) subset(x, select = c(on, vars)))
Reduce(f = function(x,y) merge(x, y, by = on),
Map(f = setnames, sep_cols, old_cols, new_cols))
}
Which in my situation translates to:
merge_select("date", "x", dt1, dt2, dt3, dt4)
date x_dt1 x_dt2 x_dt3 x_dt4
1: 2000-10-01 -0.6365707 0.11804268 -0.01084163 -0.88127011
2: 2001-10-01 -0.2533127 -3.16924568 0.45746415 0.69742537
3: 2002-10-01 2.3069266 -0.82670409 -0.54236745 -1.49613384
4: 2003-10-01 0.7075547 -0.91809007 -0.67888707 -0.26106146
5: 2004-10-01 -0.7165651 -0.45711888 -0.83903416 1.45113260
6: 2005-10-01 0.5703561 0.24587897 0.13862020 0.33928202
7: 2006-10-01 -0.6258097 -0.77652389 -0.49252474 -0.80460241
8: 2007-10-01 -0.4600565 0.55612959 0.86749410 -1.30850411
9: 2008-10-01 -0.8841649 -0.48113848 -1.55858406 0.83076846
10: 2009-10-01 -0.6262272 -0.73618265 0.13350581 0.06640803
11: 2010-10-01 0.1406454 0.08994779 1.28450204 -1.18329081
This solution also works for multiple variables, eg.
merge_select("date", c("x","y"), dt1, dt2, dt3, dt4)