I am trying to write to a subset of rows of a data.table by reference in order to deal with training, testing, and excluded rows of data for a model.
However, when I define this subset of rows and attempt to write to it, it breaks the reference without warning. Conceptually, I know that this works:
library('data.table')
a <- data.table(a1=c(0,1), a2=c(2,3))
a
# a1 a2
# 1: 0 2
# 2: 1 3
b <- a
b[,b1:=4]
b
# a1 a2 b1
# 1: 0 2 4
# 2: 1 3 4
a
# a1 a2 b1
# 1: 0 2 4
# 2: 1 3 4
But what I am trying to do is something like:
a <- data.table(a1=c(0,1), a2=c(2,3))
a
b <- a[1,]
b
# a1 a2
# 1: 0 2
b[,b1:=4]
b
# a1 a2 b1
# 1: 0 2 4
a
# a1 a2
# 1: 0 2
# 2: 1 3
# What I would really like is
#>a
# a1 a2 b1
# 1: 0 2 4
# 2: 1 3 NA
I am having a hard time reconciling this behavior with the explanation here which suggests that using the data table assignment :=
shouldn't break the reference like <-
would.
I have a key for every row, so merging the scores back is not a big deal. I'm just curious if there's a way to pass it. Basically I am trying to createDataPartition()
around some excluded rows and finding the book-keeping kind of annoying.