How to extract best parameters from a CrossValidatorModel

Question

I want to find the parameters of ParamGridBuilder that make the best model in CrossValidator in Spark 1.4.x,

In Pipeline Example in Spark documentation, they add different parameters (numFeatures, regParam) by using ParamGridBuilder in the Pipeline. Then by the following line of code they make the best model:

val cvModel = crossval.fit(training.toDF)

Now, I want to know what are the parameters (numFeatures, regParam) from ParamGridBuilder that produces the best model.

I already used the following commands without success:

cvModel.bestModel.extractParamMap().toString()
cvModel.params.toList.mkString("(", ",", ")")
cvModel.estimatorParamMaps.toString()
cvModel.explainParams()
cvModel.getEstimatorParamMaps.mkString("(", ",", ")")
cvModel.toString()

Any help?

Thanks in advance,

Best parameters are [dumped to log](https://github.com/apache/spark/blob/a721ee52705100dbd7852f80f92cde4375517e48/mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala#L104) but beats me how you can access this information from a `CrossValidatorModel` instance. — zero323, Jul 31 '15 at 16:56
That's really frustrating. They aren't even logging it in PySpark. Such a small but important thing that's lacking... it makes me wonder if anyone is actually using this functionality. — Zach Garner, Oct 27 '15 at 17:55
folks, any solution for this problem in the recent versions of Spark? — Rami, Oct 29 '15 at 16:13
you definitely can get it from `cvModel.bestModel`, please see my answer below — Algorithman, Mar 22 '18 at 03:57
[This SO thread](https://stackoverflow.com/questions/45225246/how-to-access-parameters-of-the-underlying-model-in-ml-pipeline) kinda answers the question. — panc, Apr 24 '20 at 20:24

score 20 · Answer 1 · edited May 29 '16 at 15:12

20

One method to get a proper ParamMap object is to use CrossValidatorModel.avgMetrics: Array[Double] to find the argmax ParamMap:

implicit class BestParamMapCrossValidatorModel(cvModel: CrossValidatorModel) {
  def bestEstimatorParamMap: ParamMap = {
    cvModel.getEstimatorParamMaps
           .zip(cvModel.avgMetrics)
           .maxBy(_._2)
           ._1
  }
}

When run on the CrossValidatorModel trained in the Pipeline Example you cited gives:

scala> println(cvModel.bestEstimatorParamMap)
{
   hashingTF_2b0b8ccaeeec-numFeatures: 100,
   logreg_950a13184247-regParam: 0.1
}

edited May 29 '16 at 15:12

Alexey Pechorin

5
3

answered Jan 08 '16 at 00:47

Adam Vogel

301
2
4

6

Note: `maxBy` might need to be `minBy`, depending on the value of `Evaluator.isLargerBetter`. – metasim Nov 16 '16 at 00:39

score 13 · Answer 2 · answered Nov 11 '15 at 14:27

val bestPipelineModel = cvModel.bestModel.asInstanceOf[PipelineModel]
val stages = bestPipelineModel.stages

val hashingStage = stages(1).asInstanceOf[HashingTF]
println("numFeatures = " + hashingStage.getNumFeatures)

val lrStage = stages(2).asInstanceOf[LogisticRegressionModel]
println("regParam = " + lrStage.getRegParam)

source

score 4 · Answer 3 · edited Mar 10 '20 at 01:36

4

To print everything in paramMap, you actually don't have to call parent:

cvModel.bestModel.extractParamMap()

To answer OP's question, to get a single best parameter, for example regParam:

cvModel.bestModel.extractParamMap().apply(cvModel.bestModel.getParam("regParam"))

edited Mar 10 '20 at 01:36

Dima Lituiev

12,544
10
41
58

answered Mar 22 '18 at 03:53

Algorithman

1,309
1
16
39

Note that this solution works OK with a single object. It returns an empty map in the case of a Pipeline. – Jorge M. Londoño P. Aug 19 '19 at 21:56

score 3 · Answer 4 · edited Aug 04 '17 at 19:08

3

This is how you get the chosen parameters

println(cvModel.bestModel.getMaxIter)   
println(cvModel.bestModel.getRegParam)

edited Aug 04 '17 at 19:08

desertnaut

57,590
26
140
166

answered Nov 15 '16 at 09:43

Mazen Aly

5,695
1
15
12

Please don't add the same answer to multiple questions. Answer the best one and flag the rest as duplicates. See http://meta.stackexchange.com/questions/104227/is-it-acceptable-to-add-a-duplicate-answer-to-several-questions – Bhargav Rao Nov 15 '16 at 09:49

score 2 · Answer 5 · answered Jul 04 '17 at 08:52

2

this java code should work: cvModel.bestModel().parent().extractParamMap().you can translate it to scala code parent()method will return an estimator, you can get the best params then.

answered Jul 04 '17 at 08:52

orangeHIX

31
1

This is the correct answer to pySpark as well! The key is "parent"! In pySpark, I use modelOnly.bestModel.stages[-1]._java_obj.parent().getRegParam(). – Lynn Chen Nov 18 '18 at 12:44

score 1 · Answer 6 · answered Jun 07 '16 at 03:30

This is the ParamGridBuilder()

paraGrid = ParamGridBuilder().addGrid(
hashingTF.numFeatures, [10, 100, 1000]
).addGrid(
    lr.regParam, [0.1, 0.01, 0.001]
).build()

There are 3 stages in pipeline. It seems we can assess parameters as the following:

for stage in cv_model.bestModel.stages:
    print 'stages: {}'.format(stage)
    print stage.params
    print '\n'

stage: Tokenizer_46ffb9fac5968c6c152b
[Param(parent='Tokenizer_46ffb9fac5968c6c152b', name='inputCol', doc='input column name'), Param(parent='Tokenizer_46ffb9fac5968c6c152b', name='outputCol', doc='output column name')]

stage: HashingTF_40e1af3ba73764848d43
[Param(parent='HashingTF_40e1af3ba73764848d43', name='inputCol', doc='input column name'), Param(parent='HashingTF_40e1af3ba73764848d43', name='numFeatures', doc='number of features'), Param(parent='HashingTF_40e1af3ba73764848d43', name='outputCol', doc='output column name')]

stage: LogisticRegression_451b8c8dbef84ecab7a9
[]

However, there is no parameter in the last stage, logiscRegression.

We can also get weight and intercept parameter from logistregression like the following:

cv_model.bestModel.stages[1].getNumFeatures()
10
cv_model.bestModel.stages[2].intercept
1.5791827733883774
cv_model.bestModel.stages[2].weights
DenseVector([-2.5361, -0.9541, 0.4124, 4.2108, 4.4707, 4.9451, -0.3045, 5.4348, -0.1977, -1.8361])

Full exploration: http://kuanliang.github.io/2016-06-07-SparkML-pipeline/

score 1 · Answer 7 · answered Apr 24 '20 at 20:30

This SO thread kinda answers the question.

In a nutshell, you need to cast each object to its supposed-to-be class.

For the case of CrossValidatorModel, the following is what I did:

import org.apache.spark.ml.tuning.CrossValidatorModel
import org.apache.spark.ml.PipelineModel
import org.apache.spark.ml.regression.RandomForestRegressionModel

// Load CV model from S3
val inputModelPath = "s3://path/to/my/random-forest-regression-cv"
val reloadedCvModel = CrossValidatorModel.load(inputModelPath)

// To get the parameters of the best model
(
    reloadedCvModel.bestModel
        .asInstanceOf[PipelineModel]
        .stages(1)
        .asInstanceOf[RandomForestRegressionModel]
        .extractParamMap()
)

In the example, my pipeline has two stages (a VectorIndexer and a RandomForestRegressor), so the stage index is 1 for my model.

score 0 · Answer 8 · answered Jun 27 '18 at 10:21

I am working with Spark Scala 1.6.x and here is a full example of how i can set and fit a CrossValidator and then return the value of the parameter used to get the best model (assuming that training.toDF gives a dataframe ready to be used) :

import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder}
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator

// Instantiate a LogisticRegression object
val lr = new LogisticRegression()

// Instantiate a ParamGrid with different values for the 'RegParam' parameter of the logistic regression
val paramGrid = new ParamGridBuilder().addGrid(lr.regParam, Array(0.0001, 0.001, 0.01, 0.1, 0.25, 0.5, 0.75, 1)).build()

// Setting and fitting the CrossValidator on the training set, using 'MultiClassClassificationEvaluator' as evaluator
val crossVal = new CrossValidator().setEstimator(lr).setEvaluator(new MulticlassClassificationEvaluator).setEstimatorParamMaps(paramGrid)
val cvModel = crossVal.fit(training.toDF)

// Getting the value of the 'RegParam' used to get the best model
val bestModel = cvModel.bestModel                    // Getting the best model
val paramReference = bestModel.getParam("regParam")  // Getting the reference of the parameter you want (only the reference, not the value)
val paramValue = bestModel.get(paramReference)       // Getting the value of this parameter
print(paramValue)                                    // In my case : 0.001

You can do the same for any parameter or any other type of model.

score 0 · Answer 9 · edited Oct 29 '18 at 04:02

0

If java，see this debug show;

bestModel.parent().extractParamMap()

edited Oct 29 '18 at 04:02

Stephen Rauch

47,830
31
106
135

answered Oct 29 '18 at 03:43

裴帅帅

1
1

score 0 · Answer 10 · answered Aug 19 '19 at 21:58

0

Building in the solution of @macfeliga, a single liner that works for pipelines:

cvModel.bestModel.asInstanceOf[PipelineModel]
    .stages.foreach(stage => println(stage.extractParamMap))

answered Aug 19 '19 at 21:58

Jorge M. Londoño P.

734
7
11

score 0 · Answer 11 · answered Apr 30 '20 at 06:46

For me, the @orangeHIX solution is perfect:

val cvModel = cv.fit(training)

val cvMejorModelo = cvModel.bestModel.asInstanceOf[ALSModel]

cvMejorModelo.parent.extractParamMap()

res86: org.apache.spark.ml.param.ParamMap =
{
    als_08eb64db650d-alpha: 0.05,
    als_08eb64db650d-checkpointInterval: 10,
    als_08eb64db650d-coldStartStrategy: drop,
    als_08eb64db650d-finalStorageLevel: MEMORY_AND_DISK,
    als_08eb64db650d-implicitPrefs: false,
    als_08eb64db650d-intermediateStorageLevel: MEMORY_AND_DISK,
    als_08eb64db650d-itemCol: product,
    als_08eb64db650d-maxIter: 10,
    als_08eb64db650d-nonnegative: false,
    als_08eb64db650d-numItemBlocks: 10,
    als_08eb64db650d-numUserBlocks: 10,
    als_08eb64db650d-predictionCol: prediction,
    als_08eb64db650d-rank: 1,
    als_08eb64db650d-ratingCol: rating,
    als_08eb64db650d-regParam: 0.1,
    als_08eb64db650d-seed: 1994790107,
    als_08eb64db650d-userCol: user
}

How to extract best parameters from a CrossValidatorModel

11 Answers11

Linked