I try to do some binary classification with the flink-ml svm implementation. When I evaluated the classification I got a ~85% error rate on the training dataset. I plotted the 3D data and it looked like you could separate the data quite well with a hyperplane.
When I tried to get the weight vector out of the svm I only saw the option to get the weight vector without the interception of the hyperplane. So just a hyperplane going through (0,0,0).
I don't have any clue where the error could be and appreciate every clue.
val env = ExecutionEnvironment.getExecutionEnvironment
val input: DataSet[(Int, Int, Boolean, Double, Double, Double)] = env.readCsvFile(filepathTraining, ignoreFirstLine = true, fieldDelimiter = ";")
val inputLV = input.map(
t => { LabeledVector({if(t._3) 1.0 else -1.0}, DenseVector(Array(t._4, t._5, t._6)))}
)
val trainTestDataSet = Splitter.trainTestSplit(inputLV, 0.8, precise = true, seed = 100)
val trainLV = trainTestDataSet.training
val testLV = trainTestDataSet.testing
val svm = SVM()
svm.fit(trainLV)
val testVD = testLV.map(lv => (lv.vector, lv.label))
val evalSet = svm.evaluate(testVD)
// groups the data in false negatives, false positives, true negatives, true positives
evalSet.map(t => (t._1, t._2, 1)).groupBy(0,1).reduce((x1,x2) => (x1._1, x1._2, x1._3 + x2._3)).print()
The plotted data is shown here: