The problem is that Flink's binary distribution does not contain the libraries (flink-ml, gelly, etc.). This means that you either have to ship the library jar files with your job jar or that you have to copy them manually to your cluster. I strongly recommend the first option.
Building a fat-jar to include library jars
The easiest way to build a fat jar which does not contain unnecessary jars is to use Flink's quickstart archetype to set up the project's pom.
mvn archetype:generate -DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-scala -DarchetypeVersion=0.9.0
will create the structure for a Flink project using the Scala API. The generated pom file will have the following dependencies.
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala</artifactId>
<version>0.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala</artifactId>
<version>0.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients</artifactId>
<version>0.9.0</version>
</dependency>
</dependencies>
You can remove flink-streaming-scala
and instead you insert the following dependency tag in order to include Flink's machine learning library.
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-ml</artifactId>
<version>0.9.0</version>
</dependency>
When you know build the job jar with mvn package
, the generated jar should contain the flink-ml
jar and all of its transitive dependencies.
Copying the library jars manually to the cluster
Flink includes all jars which are located in the <FLINK_ROOT_DIR>/lib
folder in the classpath of the executed jobs. Thus, in order to use Flink's machine learning library you have to put the flink-ml
jar and all needed transitive dependencies into the /lib
folder. This is rather tricky, since you have to figure out which transitive dependencies are actually needed by your algorithm and, consequently, you will often end up copying all transitive dependencies.
How to build a specific sub-module with maven
In order to build a specific sub-module X from your parent project you can use the following command:
mvn clean package -pl X -am
-pl
allows you to specify which sub-modules you want to build and -am
tells maven to also build other required sub-modules. It is also described here.