Please help a beginner improve their understanding of the amazing data.table package on an basic problem.
I want to update a data table dt
with new snippets of information with the same structure and column names dt2
, ticker / date / price. Key is ticker and date.
I know one could use
rbind(dt, dt2)
looking here or here, but I don't want to risk duplicates, so thought rather join them through a merge.I was expecting
merge(dt, dt2, all = TRUE)
to do the trick.But you get x.price y.price and not a single price column!?
Why does
dt
not match the shared columns in 2? Data frame docs imply they use intersect of names but DT seems different.Currently, I am getting around it with
by = colnames(dt)
i.e.merge(dt, dt2, all = TRUE, by = colnames(dt))
, but is this the right way to do it?
Thanks I would appreciate some guidance/confirmation (and without resorting to another package).
Example Code
library(data.table)
# Main data
dt <- data.table(ticker = c("A", "A"),
date = c("Jan", "Feb"),
price = c(10, 11),
key = c("ticker", "date"))
# Update data with same column names
dt2 <- data.table(ticker = c("A", "B"),
date = c("Mar", "Mar"),
price = c(12, 100),
key = c("ticker", "date"))
# Lets put them together
merge(dt, dt2, all = TRUE)
# but get 4 x 4 back (not 4 x 3) with price.x price.y ???
ticker date price.x price.y
1: A Feb 11 NA
2: A Jan 10 NA
3: A Mar NA 12
4: B Mar NA 100
# This works, but am I missing easier syntax?
merge(dt, dt2, all = TRUE, by = colnames(dt2))
ticker date price
1: A Feb 11
2: A Jan 10
3: A Mar 12
4: B Mar 100
(An aside) Is it better practice in R to be clear and specify merge.data.table
?