Consider this simple example, which trains a naive bayes
model on some textual data.
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"Chinese Macao",
"Tokyo Japan Chinese"),
doc_id = 1:4,
class = c(1, 1, 1, 0))
dtrain_spark <- copy_to(sc, dtrain, overwrite = TRUE)
pipeline <- ml_pipeline(
ft_tokenizer(sc, input.col = "text", output.col = "tokens"),
ft_count_vectorizer(sc, input_col = 'tokens', output_col = 'myvocab'),
ml_decision_tree_classifier(sc, label_col = "class",
features_col = "myvocab",
prediction_col = "pcol",
probability_col = "prcol",
raw_prediction_col = "rpcol")
)
The issue is that I fit several models in a loop, get some results, but I would like to be able to save these models in a list (or anything that allows me to use these models separately later on).
I tried with the usual technique: set up an empty list, and add the models to the list as they are created. Unfortunately, this does not work, as illustrated below
model_list <- list()
fitmodel <- function(sc, string){
print(paste('this is iteration', string))
model <- ml_fit(pipeline, dtrain_spark)
model_list[[string]] <- model
#do some other stuff with the model
}
purrr::map(c('stack', 'over', 'flow'), ~fitmodel(sc,.))
[1] "this is iteration stack"
[1] "this is iteration over"
[1] "this is iteration flow"
however my list is empty! :(
> model_list
list()
What is wrong here? What can be done? I would like to avoid writing to disk if possible.
Thanks!