0

I am using mlflow from R to register a model which can be retrieved and invoked from a separate container. The mlflow_log_model() function takes a crated function as argument.

The TRCDetect model can be found on GitHub and relies on 3 separate R6Class objects which are sourced from 2 additional files, Transform.R and SESD.R.

Below is the carrier::crate call.

predictor <- crate(
    function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR),
    TRCDetector=TRCDetector,
    SESD=SESD,
    TRES=TRES,
    TCHA=TCHA,
    TRAIN_SIZE=TRAIN_SIZE,
    DWIN=DWIN,
    RWIN=RWIN,
    ALPHA=ALPHA,
    MAXR=MAXR)

Below is what it looks like in terms of sizes

<crate> 400.24 kB
* function: 14.38 kB
* `SESD`: 268.21 kB
* `TRES`: 227.38 kB
* `TCHA`: 198.78 kB
* `TRCDetector`: 38.52 kB
* `ALPHA`: 56 B
* `DWIN`: 56 B
* `MAXR`: 56 B
* `RWIN`: 56 B
* `TRAIN_SIZE`: 56 B
function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR)

I am able to stash the crate in S3 and recover it from a separate container using mlflow_load_model().

Below is the recovered crate

function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR)
<environment: 0x556e0ae93c38>
attr(,"class")

Unfortunately, I am only able to use the recovered crate if I re-create the 3 dependent objects in the current environment like so

TRES <- get("TRES", get_env(p))
TCHA<- get("TCHA", get_env(p))
SESD <- get("SESD", get_env(p))

Without this, I get an 'Object not found' error. This prevents the use of the R model with the standard mlflow models serve.

I have looked into R environments in related questions but I am failing to come up with a mechanism which ensures the model is usable straight after the mlflow_load_model() call.

The option of creating an Rserve wrapper which lets me revive the 3 dependent objects is not ideal. Am I missing something when I call crate?

EDIT: had a look at the often cited page Deploying R Models with MLflow and Docker - Option 2: Don’t install the package. While I didn't quite follow the suggestion of passing set_env() I tried that as follows

predictor <- crate(
    function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR),
    TRCDetector=set_env(TRCDetector),
    SESD=set_env(SESD),
    TRES=set_env(TRES),
    TCHA=set_env(TCHA),
    TRAIN_SIZE=TRAIN_SIZE,
    DWIN=DWIN,
    RWIN=RWIN,
    ALPHA=ALPHA,
    MAXR=MAXR)

Which results in the following error:

> p(data.frame(timestamps=rep(1:500,1),value=v))
Error in TRCDetector(data = x$value, time = x$timestamps, train_size = TRAIN_SIZE,  :
  attempt to apply non-function
jdevoo
  • 46
  • 2
  • 5

1 Answers1

0

Found a solution for now. I modified the model function to pass the 3 helpers.

predictor <- crate(
    function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR, tcha=TCHA, sesd=SESD, tres=TRES),
    TRCDetector=TRCDetector,
    TCHA=TCHA,
    SESD=SESD,
    TRES=TRES,
    TRAIN_SIZE=TRAIN_SIZE,
    DWIN=DWIN,
    RWIN=RWIN,
    ALPHA=ALPHA,
    MAXR=MAXR)

From a separate container, I can now invoke the model.

> library(mlflow)
> p<-mlflow_load_model('s3://models/jovyan/eb4933b3ee354f95b6620553a0f6d780/artifacts/TRCDetector')
> p
function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR, tcha=TCHA, sesd=SESD, tres=TRES)
<environment: 0x55fcaa9c8188>
attr(,"class")
[1] "crate"
> v<-seq(1,500)
> v[sample(v,10)]=0
> p(data.frame(timestamps=seq(1,500),value=v))
 [1] 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
[20] 194 202 240 246 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302
[39] 303 304 305 306 321 357 425 426 427 428 429 430 431 432 433 434 435 436 437
[58] 438 439 440 441 442 443
jdevoo
  • 46
  • 2
  • 5