I have a data table that looks something like this:
library(data.table)
set.seed(1)
# Number of rows in the data table
obs <- 10^2
# Generate representative data
DT <- data.table(
V1 = sample(x = 1:10, size = obs, replace = TRUE),
V2 = sample(x = 11:20, size = obs, replace = TRUE),
V3 = sample(x = 21:30, size = obs, replace = TRUE)
)
And a vectorized function fn_calibrate
that calculates an output variable V4
based on an input variable opt
:
fn_calibrate <- function(opt) {
# Calculate some new value V4 that's dependent on opt
DT[, V4 := V1 * sqrt(V2) / opt ]
# Calculate the residual sum of squares (RSS) between V4 and a target value V3
DT[, rss := abs(V3 - V4)^2]
# Return the RSS
return(DT[, rss])
}
Now, I would like to perform a rowwise optimization using the optimize
function, i.e. find the value of opt
that minimizes the RSS for each row.
I was hoping to achieve that with the data.table by =
syntax, such as:
# Run the optimizer rowwise
DT[, opt := optimize(f = fn_calibrate, interval = c(0.1, 1), tol = .0015)$minimum, by = seq_len(nrow(DT))]
The code returns the error invalid function value in 'optimize'
because the fn_calibrate
function is currently written (DT[, ...]
) to return a whole vector of rss
of length nrow(DT)
, instead of a scalar for just one row at a time.
My question is: is there a way to have fn_calibrate
return rowwise results to the optimizer as well?
Edit
I realize a related question was asked and answered here in the context of a data frame, though the accepted answer uses a for
loop whereas I would rather use the efficient data table by
syntax, if possible. The RepRex above is simple (100 rows), but my actual data table is larger (250K rows).