I have 1 continuous feauture 'Tenure' and 1 categorical feature 'Nationality' in my sample. My sample observations have more than 50 different nationalities and 30 different tenures (0-30 years). In Spark ML, to identify which features are categorical you need to specify MaxCategories as below before creating a DecisionTreeClassifier model.
val featureIndexer = new VectorIndexer()
.setInputCol("features")
.setOutputCol("indexedFeatures")
.setMaxCategories(5)**
.fit(vecDF)
But In this case it does not work because 'Tenure' is continuous and has less distinct values than 'Nationalities'. Is there a way to specify which features are categorical as in spark MLlib? Thanks
val categoricalFeaturesInfo = Map[Int, Int]()