I have a weka model stored in S3 which is of size around 400MB. Now, I have some set of record on which I want to run the model and perform prediction.
For performing prediction, What I have tried is,
Download and load the model on driver as a static object , broadcast it to all executors. Perform a map operation on prediction RDD. ----> Not working, as in Weka for performing prediction, model object needs to be modified and broadcast require a read-only copy.
Download and load the model on driver as a static object and send it to executor in each map operation. -----> Working (Not efficient, as in each map operation, i am passing 400MB object)
Download the model on driver and load it on each executor and cache it there. (Don't know how to do that)
Does someone have any idea how can I load the model on each executor once and cache it so that for other records I don't load it again?