H2O Sparkling Water AutoML not working properly in Spark Scala. Exception: Ease restrictions on setMaxModels or setMaxRuntimeSecs

Question

I'm looking into H2O sparkling water AutoML using scala. I'm running it on my laptop on localhost. Even though I'm not adding any restrictions on H2OAutoML() class using setMaxModels or setMaxRuntimeSecs method. The model.fit method fails with an exception asking me to ease restrictions on setMaxModels or setMaxRuntimeSecs.

Exception in thread "main" ai.h2o.sparkling.ml.algos.H2OAutoML$$anon$1: No model returned from H2O AutoML. For example, try to ease your 'excludeAlgo', 'maxModels' or 'maxRuntimeSecs' properties.

Update: The dataset which I'm using diabeties.csv is a dataset for classification. But if I set metric using setSortMetric to AUTO then it works fine. It doesn't throw any exception but instead of classification it does regression on that dataset.

Here is the code:

def main(args: Array[String]): Unit = {
    println("H2O AutoML")
    println("Creating Spark Session..")
    val sparkConf = new SparkConf().setAppName("H2OAutoML").setMaster("local[*]")
      .set("spark.ext.h2o.repl.enabled","false")
      .set("spark.driver.host","localhost")
    val sparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
    val hc = H2OContext.getOrCreate(sparkSession.sparkContext)
    val df = sparkSession.read.option("header",true).
      option("inferschema",true)
      .csv("/Datasets/diabeties.csv")

    df.show()

    df.schema.fields.foreach(x => println(x.dataType))
    val Array(trainingDF, testingDF) = df.randomSplit(Array(0.8, 0.2))
    val automl = new H2OAutoML()
    automl.setLabelCol("diabetes")
    automl.setSortMetric("logloss")
    val model = automl.fit(trainingDF)
    println(model.getModelDetails())

  }

H2O Sparkling Water AutoML not working properly in Spark Scala. Exception: Ease restrictions on setMaxModels or setMaxRuntimeSecs

0 Answers0