1

I have a data table that looks something like this:

library(data.table)
set.seed(1)

# Number of rows in the data table
obs <- 10^2

# Generate representative data
DT <- data.table(
  V1 = sample(x =  1:10, size = obs, replace = TRUE),
  V2 = sample(x = 11:20, size = obs, replace = TRUE),
  V3 = sample(x = 21:30, size = obs, replace = TRUE)
)

And a vectorized function fn_calibrate that calculates an output variable V4 based on an input variable opt:

fn_calibrate <- function(opt) {

  # Calculate some new value V4 that's dependent on opt
  DT[, V4 := V1 * sqrt(V2) / opt ]

  # Calculate the residual sum of squares (RSS) between V4 and a target value V3
  DT[, rss := abs(V3 - V4)^2]

  # Return the RSS
  return(DT[, rss])

}

Now, I would like to perform a rowwise optimization using the optimize function, i.e. find the value of opt that minimizes the RSS for each row.

I was hoping to achieve that with the data.table by = syntax, such as:

# Run the optimizer rowwise
DT[, opt := optimize(f = fn_calibrate, interval = c(0.1, 1), tol = .0015)$minimum, by = seq_len(nrow(DT))]

The code returns the error invalid function value in 'optimize' because the fn_calibrate function is currently written (DT[, ...]) to return a whole vector of rss of length nrow(DT), instead of a scalar for just one row at a time.

My question is: is there a way to have fn_calibrate return rowwise results to the optimizer as well?

Edit

I realize a related question was asked and answered here in the context of a data frame, though the accepted answer uses a for loop whereas I would rather use the efficient data table by syntax, if possible. The RepRex above is simple (100 rows), but my actual data table is larger (250K rows).

1 Answers1

1

fcn_calibrate doesn't need to be vectorized and use data.table syntax.

You could pass V1,V2,V3,opt as parameters and optimize on opt only :

fn_calibrate <- function(V1,V2,V3,opt) {
  
  # Calculate some new value V4 that's dependent on opt
  V4 = V1 * sqrt(V2) / opt
  
  # Calculate the residual sum of squares (RSS) between V4 and a target value V3
  rss = abs(V3 - V4)^2
  
  # Return the RSS
  return(rss)
  
}

DT[, opt := optimize(f = function(opt) fn_calibrate(V1,V2,V3,opt),
                     interval = c(0.1, 1), tol = .0015)$minimum,
                     by = seq_len(nrow(DT))]

      V1    V2    V3       opt
     <int> <int> <int>     <num>
  1:     9    13    21 0.9990479
  2:     4    20    30 0.5962869
  3:     7    13    24 0.9992591
  4:     1    11    29 0.1142778
  5:     2    16    29 0.2756422
  6:     7    16    29 0.9656941
  7:     2    14    29 0.2578275
  8:     3    19    26 0.5028686
  9:     1    15    26 0.1490109
...
Waldi
  • 39,242
  • 6
  • 30
  • 78