I am using mlflow from R to register a model which can be retrieved and invoked from a separate container. The mlflow_log_model()
function takes a crated function as argument.
The TRCDetect
model can be found on GitHub and relies on 3 separate R6Class objects which are sourced from 2 additional files, Transform.R
and SESD.R
.
Below is the carrier::crate call.
predictor <- crate(
function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR),
TRCDetector=TRCDetector,
SESD=SESD,
TRES=TRES,
TCHA=TCHA,
TRAIN_SIZE=TRAIN_SIZE,
DWIN=DWIN,
RWIN=RWIN,
ALPHA=ALPHA,
MAXR=MAXR)
Below is what it looks like in terms of sizes
<crate> 400.24 kB
* function: 14.38 kB
* `SESD`: 268.21 kB
* `TRES`: 227.38 kB
* `TCHA`: 198.78 kB
* `TRCDetector`: 38.52 kB
* `ALPHA`: 56 B
* `DWIN`: 56 B
* `MAXR`: 56 B
* `RWIN`: 56 B
* `TRAIN_SIZE`: 56 B
function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR)
I am able to stash the crate in S3 and recover it from a separate container using mlflow_load_model()
.
Below is the recovered crate
function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR)
<environment: 0x556e0ae93c38>
attr(,"class")
Unfortunately, I am only able to use the recovered crate if I re-create the 3 dependent objects in the current environment like so
TRES <- get("TRES", get_env(p))
TCHA<- get("TCHA", get_env(p))
SESD <- get("SESD", get_env(p))
Without this, I get an 'Object not found' error.
This prevents the use of the R model with the standard mlflow models serve
.
I have looked into R environments in related questions but I am failing to come up with a mechanism which ensures the model is usable straight after the mlflow_load_model()
call.
The option of creating an Rserve wrapper which lets me revive the 3 dependent objects is not ideal. Am I missing something when I call crate?
EDIT: had a look at the often cited page Deploying R Models with MLflow and Docker - Option 2: Don’t install the package. While I didn't quite follow the suggestion of passing set_env()
I tried that as follows
predictor <- crate(
function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR),
TRCDetector=set_env(TRCDetector),
SESD=set_env(SESD),
TRES=set_env(TRES),
TCHA=set_env(TCHA),
TRAIN_SIZE=TRAIN_SIZE,
DWIN=DWIN,
RWIN=RWIN,
ALPHA=ALPHA,
MAXR=MAXR)
Which results in the following error:
> p(data.frame(timestamps=rep(1:500,1),value=v))
Error in TRCDetector(data = x$value, time = x$timestamps, train_size = TRAIN_SIZE, :
attempt to apply non-function