0

I've a multiple regression model trained and now I want to use it to predict.

Reading the documents I understand that the input is a labeled vector and the output is a Dataset with tuple [InputValue, PredictValue], right?

I create my labeled Vector:

val mapped = data.map {x => new org.apache.flink.ml.common.LabeledVector (x._4, org.apache.flink.ml.math.DenseVector(x._1,x._2,x._3)) }

//Print
mapped: org.apache.flink.api.scala.DataSet[org.apache.flink.ml.common.LabeledVector] = org.apache.flink.api.scala.DataSet@7d4fefdc
LabeledVector(6.7, DenseVector(33.0, -52.26, 28.3))
LabeledVector(5.8, DenseVector(36.0, 45.53, 150.93))
.....

And with my model created and trained I predict:

// Calculate the predictions for the test data
val predictions = mlr.predict(mapped)

I got this ERROR:

java.lang.RuntimeException: There is no PredictOperation defined for org.apache.flink.ml.regression.MultipleLinearRegression which takes a DataSet[org.apache.flink.ml.common.LabeledVector] as input.

But you can see here that the official documentation say that it exits.

Thanks for your help! :)

Borja
  • 194
  • 1
  • 3
  • 17

1 Answers1

2

The prediction of LabeledVectors has been removed with this commit. Unfortunately, the Flink documentation has not been updated. I've created an issue to update the documentation.

If you want to predict LabeledVectors, then you have to write your own PredictOperation which supports the respective types.

Till Rohrmann
  • 13,148
  • 1
  • 25
  • 51
  • **Thanks for your help** Just to know.. Why did you decide to remove it? By the way, I think I found several mistakes more... what a pity, such a powerful tool with that documentation... If you need some young talent to fix it, I can help! ;) – Borja Jun 02 '17 at 18:47
  • How may I know the results of algorithm predictions? The residuals, mean square error, r², etc. – Borja Jun 02 '17 at 18:51
  • The Flink community always welcomes new contributors. So if you like to help, then create some JIRA issues and start cracking :-) The reason why we removed it was that it was a corner case added for the evaluation. Instead we wanted to develop a proper evaluation framework which allows you to calculate measures for your predictions. Exactly what you've asked for: mean square error, residuals, etc. Since the evaluation framework is not finished yet, you would have to implement on your own. – Till Rohrmann Jun 05 '17 at 18:44
  • I will! :) although I'm afraid I don't have the enough level yet haha by the way, could you take a look to this post [link](https://stackoverflow.com/questions/44353875/trigger-execution-model-linearregression-in-flink-slower-than-spark) I've followed a answer you gave to someone but it doesn't work completly for me. I really appreciate if you could help in this post because it's for my thesis. – Borja Jun 05 '17 at 18:59
  • There is no reason to be afraid. Every contributor is valuable for the community. – Till Rohrmann Jun 06 '17 at 07:27