Trigger execution model LinearRegression in Flink -> Slower than Spark?

Question

I've develop a Multiple Linear Regression and Kmeans in both Spark and Flink to compare their performance in batch (I'm using Zeppelin to programming and execute, and Ganglia to measure).

I read in the answer of this post that I've to trigger the execution of method train, so I did.

Hovewer, in Linear Regression, Flink takes 3 minutes 27 seconds (just in the trigger part) meanwhile Spark just around 30 seconds (in whole execution)...so I think I'm doing something wrong because this is not possible.

Flinks is also slower comparing K-means algorithms.

This is my code:

//Read the data
val data: DataSet[org.apache.flink.ml.common.LabeledVector] = MLUtils.readLibSVM(benv, /.../quake_test_I.libsvm")

//Example of data
6.1 1:33.0 2:53.26 3:-161.74
5.8 1:45.0 2:51.34 3:173.44
5.9 1:17.0 2:28.62 3:142.42
5.8 1:28.0 2:52.73 3:171.99

// Create multiple linear regression learner
val mlr = MultipleLinearRegression()
.setIterations(10)
.setStepsize(0.5)
.setConvergenceThreshold(0.001)

//Train the model
val model = mlr.fit(data)

//Tigger its execution
val weights = mlr.weightsOption match {
  case Some(weights) => weights.collect()
  case None => throw new Exception("Could not calculate the weights.")

How should I trigger the execution of this model?

Thanks for your help! :)

Trigger execution model LinearRegression in Flink -> Slower than Spark?

0 Answers0

Linked