i have a problem with ml.crossvalidator in scala spark while using one hot encoder.
this is my code
val tokenizer = new Tokenizer().
setInputCol("subjects").
setOutputCol("subject")
//CountVectorizer / TF
val countVectorizer = new CountVectorizer().
setInputCol("subject").
setOutputCol("features")
// convert string into numerical values
val labelIndexer = new StringIndexer().
setInputCol("labelss").
setOutputCol("labelsss")
// convert numerical to one hot encoder
val labelEncoder = new OneHotEncoder().
setInputCol("labelsss").
setOutputCol("label")
val logisticRegression = new LogisticRegression()
val pipeline = new Pipeline().setStages(Array(tokenizer,countVectorizer,labelIndexer,labelEncoder,logisticRegression))
and give me an error like this
cv: org.apache.spark.ml.tuning.CrossValidator = cv_8cc1ae985e39
java.lang.IllegalArgumentException: requirement failed: Column label must be of type NumericType but was actually of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7.
i have no idea, how to fix it.
i need one hot encoder coz my label is categorical.
thanks for helping me :)