I want to do the same as explained here, i.e. adding missing rows to a data.table. The only additional difficulty I'm facing is that I want the number of key columns, i.e. those rows that are used for the self-join, to be flexible.
Here is a small example that basically repeats what is done in the link mentioned above:
df <- data.frame(fundID = rep(letters[1:4], each=6),
cfType = rep(c("D", "D", "T", "T", "R", "R"), times=4),
variable = rep(c(1,3), times=12),
value = 1:24)
DT <- as.data.table(df)
idCols <- c("fundID", "cfType")
setkeyv(DT, c(idCols, "variable"))
DT[CJ(unique(df$fundID), unique(df$cfType), seq(from=min(variable), to=max(variable))), nomatch=NA]
What bothers me is the last line. I want idCols
to be flexible (for instance if I use it within a function), so I don't want to type unique(df$fundID), unique(df$cfType)
manually. However, I just don't find any workaround for this. All my attempts to automatically split the subset of df
into vectors, as needed by CJ
, fail with the error message Error in setkeyv(x, cols, verbose = verbose) : Column 'V1' is type 'list' which is not (currently) allowed as a key column type.
CJ(sapply(df[, idCols], unique))
CJ(unique(df[, idCols]))
CJ(as.vector(unique(df[, idCols])))
CJ(unique(DT[, idCols, with=FALSE]))
I also tried building the expression myself:
str <- ""
for (i in idCols) {
str <- paste0(str, "unique(df$", i, "), ")
}
str <- paste0(str, "seq(from=min(variable), to=max(variable))")
str
[1] "unique(df$fundID), unique(df$cfType), seq(from=min(variable), to=max(variable))"
But then I don't know how to use str
. This all fails:
CJ(eval(str))
CJ(substitute(str))
CJ(call(str))
Does anyone know a good workaround?