I am trying to import a POJO model into Sparkling Water. I am currently importing the model by compiling it using:
javac -cp /opt/bitnami/commons/pojo.jar -J-Xmx2g -J-XX:MaxPermSize=256m /opt/bitnami/commons/GBM_model_python_1642760589977_1.java
And after this, I load it using hex.genmodel.GenModel, something like this:
val classLocation = new File("/opt/bitnami/commons/").toURL
valLocation = Array[java.net.URL](classLocation)
val classLoader = new URLClassLoader(Location,classOf[GenModel].getClassLoader)
val cls = Class.forName("GBM_model_python_1642760589977_1", true, classLoader)
val model: GenModel = cls.newInstance().asInstanceOf[GenModel]
The problem is when making predictions I have problems with URLClassLoader:
val easyModel = new EasyPredictModelWrapper(model)
classLoader.close()
val header = model.getNames
val outputType = easyModel.getModelCategory
val predictionRdd = testData.rdd.map(row => {
val r = new RowData
header.indices.foreach(idx => r.put(header(idx), row.getDouble(idx).asInstanceOf[AnyRef]))
val prediction = easyModel.predictMultinomial(r)
prediction
})
Throwing the exception:
org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: java.net.URLClassLoader
Serialization stack:
I dont know why since i think URLClassLoader isnt in use. I tried to use classLoader.close()
to solve it but it didnt work.
My questions are: Is there an easier way to import POJO models into Sparkling Water? If so, and this is the ideal way, right now I am compiling the model locally but I need to save them in S3... Is there any way to load the model without having to compile it locally like saving it in memory or something? How can I fix the serialization issue?