I am having following code, and trying to set label using string indexer and features using vector assembler
StructType schema = createStructType(new StructField[]{
createStructField("id", IntegerType, false),
createStructField("country", StringType, false),
createStructField("hour", IntegerType, false),
createStructField("clicked", DoubleType, false)
});
List<Row> data = Arrays.asList(
RowFactory.create(7, "US", 18, 1.0),
RowFactory.create(8, "CA", 12, 0.0),
RowFactory.create(9, "NZ", 15, 0.0)
);
Dataset<Row> dataset = sparkSession.createDataFrame(data, schema);
StringIndexer indexer = new StringIndexer()
.setInputCol("clicked")
.setOutputCol("label");
Dataset<Row> ds = indexer.fit(dataset).transform(dataset);
VectorAssembler assembler = new VectorAssembler()
.setInputCols(new String[]{"id", "country", "hour"})
.setOutputCol("features");
Dataset<Row> finalDS = assembler.transform(ds);
LogisticRegression lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8);
// Fit the model
LogisticRegressionModel lrModel = lr.fit(finalDS);
Dataset<Row> output = lrModel.transform(finalDS);
output.select("features", "label").show();
when i am submitting it on spark, i am getting following error message:
7/04/27 22:34:24 INFO DAGScheduler: Job 0 finished: countByValue at StringIndexer.scala:92, took 1.003742 s
Exception in thread "main" java.lang.IllegalArgumentException: Data type StringType is not supported.
at org.apache.spark.ml.feature.VectorAssembler$$anonfun$transformSchema$1.apply(VectorAssembler.scala:121)
at org.apache.spark.ml.feature.VectorAssembler$$anonfun$transformSchema$1.apply(VectorAssembler.scala:117)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:117)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
at org.apache.spark.ml.feature.VectorAssembler.transform(VectorAssembler.scala:54)