I am trying to create and apply a Spark
ml_pipeline
object that can handle an external parameter that will vary (typically a date). According to the Spark
documentation, it seems possible: see part with ParamMap
here
I haven't tried exactly how to do it. I was thinking of something like this:
table.df <- data.frame("a" = c(1,2,3))
table.sdf <- sdf_copy_to(sc, table.df)
param = 5
param2 = 4
# operation declaration
table2.sdf <- table.sdf %>%
mutate(test = param)
# pipeline creation
pipeline_1 = ml_pipeline(sc) %>%
ft_dplyr_transformer(table2.sdf) %>%
ml_fit(table.sdf, list("param" = param))
# pipeline application with another value for param
table2.sdf <- pipeline_1 %>%
ml_transform(table.sdf, list("param" = param2))
#result
glimpse(table2.sdf %>% select(test))
# doesn work...