FlinkML is the machine learning library for the Apache Flink distributed streaming engine.
FlinkML is the Machine Learning (ML) library for Flink. It is a new effort in the Flink community, with a growing list of algorithms and contributors. FlinkML aims to provide scalable ML algorithms, an intuitive API, and tools that help minimize glue code in end-to-end ML systems.
Getting Started
If you want to jump right in, you have to set up a Flink program. Next, you have to add the FlinkML dependency to the pom.xml of your project.
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-ml</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
Now you can start solving your analysis task. The following code snippet shows how easy it is to train a multiple linear regression model.
// LabeledVector is a feature vector with a label (class or real value)
val trainingData: DataSet[LabeledVector] = ...
val testingData: DataSet[Vector] = ...
val mlr = MultipleLinearRegression()
.setStepsize(1.0)
.setIterations(100)
.setConvergenceThreshold(0.001)
mlr.fit(trainingData, parameters)
// The fitted model can now be used to make predictions
val predictions: DataSet[LabeledVector] = mlr.predict(testingData)
Learn more about FlinkML here.