[data.table_1.9.6] background of the question is that I am trying to build olap-like query features in a star-schema-like data layout, i.e. a large fact table and several meta tables. I am building a function wrapper around data.table join followed by an aggregation in a chain as so:
# dummy data
dt1 = data.table(id = 1:5, x=letters[1:5], a=11:15, b=21:25)
dt2 = data.table(k=11:15, z=letters[11:15])
# standard data.table query with ad-hoc key -> works fine
dt1[dt2, c("z") := .(i.z), with = F,
on = c(a="k")][, .(m = sum(a, na.rm = T),
count = .N), by = c("z")]
# wrapper function with setkey -> works fine
agg_foo <- function(x, meta_tbl, x_key, meta_key, agg_var) {
setkeyv(x, x_key)
setkeyv(meta_tbl, meta_key)
x[meta_tbl, (agg_var) := get(agg_var)][,.(a_sum = sum(a, na.rm=T),
count = .N),
by = c(agg_var)]
x[, (agg_var) := .(NULL)]
}
# call function (works fine)
agg_foo(x=dt1, meta_tbl=dt2, x_key="a", meta_key="k",agg_var="z")
# wrapper function with ad-hoc key -> does not work
agg_foo_ad_hoc <- function(x, meta_tbl, x_key, meta_key, agg_var) {
x[meta_tbl, (agg_var) := get(agg_var),
on = c(x_key = meta_key)][,.(a_sum = sum(a, na.rm=T),
count = .N), by = c(agg_var)]
x[, (agg_var) := .(NULL)]
}
# call function (causes error)
agg_foo_ad_hoc(x=dt1, meta_tbl=dt2, x_key="a", meta_key="k",agg_var="z")
Error in forderv(x, by = rightcols) :
'by' value -2147483648 out of range [1,4]
my guess is that I have to provide the ad-hoc "on" parameter in a different way. I tried on = c(get(x_key) = meta_key) but then he is complaining about unexpected brackets. I could go with setkey version of the function that works but I wonder if this is efficient given that the function will work on different meta tables depending on which attribute for the aggregation is used and thus constantly re-setting the key. or is the setkey always to be preferred? actual fact table (x here) has > 30 mln rows.