Questions tagged [flinkml]

FlinkML is the machine learning library for the Apache Flink distributed streaming engine.

FlinkML is the Machine Learning (ML) library for Flink. It is a new effort in the Flink community, with a growing list of algorithms and contributors. FlinkML aims to provide scalable ML algorithms, an intuitive API, and tools that help minimize glue code in end-to-end ML systems.

Getting Started

If you want to jump right in, you have to set up a Flink program. Next, you have to add the FlinkML dependency to the pom.xml of your project.

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-ml</artifactId>
  <version>1.0-SNAPSHOT</version>
</dependency>

Now you can start solving your analysis task. The following code snippet shows how easy it is to train a multiple linear regression model.

// LabeledVector is a feature vector with a label (class or real value)
val trainingData: DataSet[LabeledVector] = ...
val testingData: DataSet[Vector] = ...

val mlr = MultipleLinearRegression()
  .setStepsize(1.0)
  .setIterations(100)
  .setConvergenceThreshold(0.001)

mlr.fit(trainingData, parameters)

// The fitted model can now be used to make predictions
val predictions: DataSet[LabeledVector] = mlr.predict(testingData)

Learn more about FlinkML here.

34 questions
7
votes
1 answer

What is the status of FlinkML?

Don't see much recent discussion about FlinkML - is it dying or dead? What are some examples of some interesting recent live usages?
johnlon
  • 167
  • 1
  • 11
7
votes
1 answer

Flink SVM 90% misclassification

I try to do some binary classification with the flink-ml svm implementation. When I evaluated the classification I got a ~85% error rate on the training dataset. I plotted the 3D data and it looked like you could separate the data quite well with a…
hucko
  • 81
  • 5
4
votes
1 answer

Real-Time streaming prediction in Flink using scala

Flink version : 1.2.0 Scala version : 2.11.8 I want to use a DataStream to predict using a model in flink using scala. I have a DataStream[String] in flink using scala which contains json formatted data from a kafka source.I want to use this…
4
votes
2 answers

FlinkMLTools NoClassDef when running jar built with maven

I'm working on a recommender system using Apache Flink. The implementation is running when I test it in IntelliJ, but I would like now to go on a cluster. I also built a jar file and tested it locally to see if all was working but I encountered a…
Kerial
  • 205
  • 2
  • 8
4
votes
1 answer

OutOfBoundsException with ALS - Flink MLlib

I'm doing a recommandation system for movies, using the MovieLens datasets available here : http://grouplens.org/datasets/movielens/ To compute this recommandation system, I use the ML library of Flink in scala, and particulalrly the ALS algorithm…
3
votes
1 answer

Extracting weights from FlinkML Multiple Linear Regression

I am running the example multiple linear regression for Flink (0.10-SNAPSHOT). I can't figure out how to extract the weights (e.g. slope and intercept, beta0-beta1, what ever you want to call them). I'm not super seasoned in Scala, that is…
2
votes
1 answer

Apache Flink load ML model from file

I'd like to know if there is a way(or some sort of code example) to load an encoded pre-trained model (written in python) inside a Flink streaming application. So I can fit the model using the weights loaded from the file system and the data coming…
elton stafa
  • 53
  • 1
  • 3
2
votes
1 answer

Embedd existing ML model in apache flink

we are training machine learning models offline and persist them in python pickle-files. We were wondering about the best way to embedd those pickeled-models into a stream (e.g. sensorInputStream > PredictionJob > OutputStream. Apache Flink ML seems…
Green Lomu
  • 65
  • 1
  • 9
2
votes
0 answers

Flink ML - java.lang.ClassNotFoundException: org.apache.flink.ml.math.DenseVector

When submitting a job to my Flink 1.8.1 cluster, it fails with the following exception: java.lang.ClassNotFoundException: org.apache.flink.ml.math.DenseVector However, the mentioned class seems to be in my jar according to: jar -tf myjar.jar | grep…
2
votes
1 answer

Apache Flink Stochastic Outlier Selection on Data Stream

I am trying to use the StochasticOutlierSelection model of the Apache Flink ML package. I cannot work out how to use it with Kafka as the data source, I understand it needs a DataSet rather than a DataStream, but I don't seem to be able to window…
2
votes
1 answer

FlinkML: Joining DataSets of LabeledVector does not work

I am currently trying to join two DataSets (part of the flink 0.10-SNAPSHOT API). Both DataSets have the same form: predictions: 6.932018685453303E155 DenseVector(0.0, 1.4, 1437.0) org: 2.0 DenseVector(0.0, 1.4, 1437.0) general…
Flow
  • 81
  • 6
1
vote
0 answers

Pyflink windowAll() by event-time to apply a clutering model

I'm a beginner on pyflink framework and I would like to know if my use case is possible with it ... I need to make a tumbling windows and apply a python udf (scikit learn clustering model) on it. The use case is : every 30 seconds I want to apply…
1
vote
1 answer

Using PyFlink with LightGBM

Is it possible to use PyFlink with python machine learning libraries such as LightGBM for a streaming application? Is there any good example for this?
1
vote
1 answer

Apache Flink - Prediction Handling

I am currently working with Apache Flink's SVM-Class to predict some text data. The class provides a predict-function which is taking a DataSet[Vector] as an input and gives me a DataSet[Prediction] as result. So far so good. My problem is, that i…
IboJaan
  • 63
  • 7
1
vote
1 answer

Streaming Predictions in Apache Flink

Is it possible to make predictions on a dataStream in Apache Flink using a model that is already trained in batch? The predict function from svm needs as input a dataset and does not take a datastream. Unfortunately I am not able to figure it out…
1
2 3