Spark MLLib Crossvalidation of SVM

Question

I use Spark MLLib to conduct a SVM classification on a RDD of LabeledPoints. I want to cross validate it. Which is the best way to do it? Does anyone have an example code? I found the CrossValidator class which relies on a DataFrame though.

My aim is to obtain the F-score.

score 1 · Answer 1 · answered May 31 '21 at 22:14

I've faced the same issue for over a month until I realized that I must use the ML API instead of the MLlib API (more about the differences between both of them here). In that case, the SVM for the new API is the LinearSVC:

from pyspark.ml.classification import RandomForestClassifier, LinearSVC
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder, CrossValidatorModel
from pyspark.ml.evaluation import MulticlassClassificationEvaluator

# SVM
crossval = CrossValidator(estimator=LinearSVC(),
                          estimatorParamMaps=ParamGridBuilder().build(),
                          evaluator=MulticlassClassificationEvaluator(metricName='f1'),
                          numFolds=5,
                          parallelism=4)

# Random Forest
crossval = CrossValidator(estimator=RandomForestClassifier(),
                          estimatorParamMaps=ParamGridBuilder().build(),
                          evaluator=MulticlassClassificationEvaluator(metricName='f1'),
                          numFolds=5,
                          parallelism=4)

In both cases you can just fit the model:

cross_model: CrossValidatorModel = crossval.fit

score 0 · Answer 2 · answered Mar 09 '16 at 12:53

0

You can find a complete example on Spark's github, though not with SVM but logistic regression.

The best way is to change your RDD into a DataFrame using rdd.toDF() method.

answered Mar 09 '16 at 12:53

Mateusz Dymczyk

14,969
10
59
94

3

Thanks so far. In the example a LogisticRegression object is instanciated and inserted into the pipeline. It can't find any SVM to instantiate which fits into the pipeline though. Which class to use? – jp_ Mar 11 '16 at 15:25

Spark MLLib Crossvalidation of SVM

2 Answers2