1

Creating Spark ML object we just need to know:

  • The type of model
  • The parameters for the model

I am just brainstorming a way to pass this information using json and instantiate a Spark ML object from it.

For example, with this json

{
    "model": RandomForestClassifier,
    "numTrees": 10,
    "featuresCol": "binaryFeatures"
}

It will instantiate a Random Forest model.

val rf = new RandomForestClassifier().setNumTrees(10).setFeaturesCol("binaryFeatures")

It is fairly straightforward to write a custom json serializer/deserializer by my own. Scala's pattern match seems good use case to dynamically instantiate an object from the name in string. However, when the object gets more complex (i.e. supporting pipeline), it is hard to maintain the custom serializer.

Is there any existing implementation for this? If not, what should the json structure look like?

gyoho
  • 799
  • 2
  • 9
  • 25
  • do you have to use scala? I only mention because python's reflection capabilities are much easier imo – James Tobin Jan 31 '17 at 17:39
  • I guess I can use pyspark-submit too. Do you have some cool resources about python's reflection? – gyoho Jan 31 '17 at 18:04
  • python is inherently reflected as its a pure scripting language instead of a compiled language, so I'm not sure how to go into depth there. http://stackoverflow.com/questions/1183645/eval-in-scala is a nice starting point to start using reflection in Scala. I'm stuck with old versions of spark/scala, so it might be a LOT easier with more recent versions – James Tobin Jan 31 '17 at 19:28

0 Answers0