0

I'm writing a piece of Scala code that would construct a Spark-ML pipeline from config file for me. I want to be able to instantiate objects that extend Params class (i.e. PipelineStage) and a Pipeline itself.

pipeline {
    class = "org.apache.spark.ml.Pipeline"
    stages = ["pca", "vectorAssembler"]
    vectorAssembler {
        class = "org.apache.spark.ml.feature.VectorAssembler"
        inputCols = ["pcacol","col1","col2","col3"]
        outputCol = "features"
    }
    pca {
    ....
    }
}

At the moment I instantiate the class and set parameters by calling Params#set method. I want the parser to be as general as possible and I want to be able to set parameters of any type, including privitives, arrays of primitives and arrays of objects (e.g. Pipeline#stages). The problem is I can't tell distinguish type parameters of the setter. I look at the type of required parameter and cast config's value to this type.

 param match {
      case p: DoubleArrayParam =>
        ...
      case p: IntArrayParam =>
        ...
      case p: StringArrayParam =>
        ...
      case p: Param[Array[Params]] =>
        ...
      case p =>
        ...

In runtime Param[String], Param[Array[String]] and Param[Array[Params]] are the same. For string arrays and arrays of primitives there are separate classes, DoubleArrayParam, IntArrayParam, StringArrayParam, but I can't find a way to tell array of Params from simple String, as Param[Any] matches penultimate case in the code above.

The only solution I came to is to parse Pipeline config separately, but that means I might come to other specific cases in future.

Kal-ko
  • 317
  • 1
  • 10
  • 1
    I hope that you can find some useful information here: https://stackoverflow.com/questions/1094173/how-do-i-get-around-type-erasure-on-scala-or-why-cant-i-get-the-type-paramete – Hosam Aly Dec 10 '17 at 20:28
  • @HosamAly thanks for the link, I learned a lot from it. Unfortunately it didn't help much in this case as I found my *Param[_]* objects to be type-erased already. As a result I had to use reflection to extract types from corresponding fields – Kal-ko Dec 12 '17 at 08:17

0 Answers0