Questions tagged [apache-flink]

Apache Flink is an open source platform for scalable batch and stream data processing. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

Apache Flink

case class WordWithCount(word: String, count: Int)

val text = env.readTextFile(path)

val counts = text.flatMap { _.split("\\W+") }
  .map { WordWithCount(_, 1) }
  .groupBy("word")
  .sum("count")

counts.writeAsCsv(outputPath)

These are some of the unique features of Flink:

Hybrid batch/streaming runtime that supports batch processing and data streaming programs.
Custom memory management to guarantee efficient, adaptive, and highly robust switching between in-memory and out-of-core data processing algorithms.
Flexible and expressive windowing semantics for data stream programs.
Built-in program optimizer that chooses the proper runtime operations for each program.
Custom type analysis and serialization stack for high performance.

Learn more about Flink here.

Building Apache Flink from Source

Prerequisites for building Flink:

Unix-like environment (We use Linux, Mac OS X, Cygwin)
git
Maven (at least version 3.0.4)
Java 6, 7 or 8 (Note that Oracle's JDK 6 library will fail to build Flink, but is able to run a pre-compiled package without problem)

Commands:

git clone https://github.com/apache/flink.git
cd flink
mvn clean package -DskipTests

Flink is now installed in build-target

Developing Flink

The Flink committers use IntelliJ IDEA and Eclipse IDE to develop the Flink codebase.

Minimal requirements for an IDE are:

Support for Java and Scala (also mixed projects)
Support for Maven with Java and Scala

IntelliJ IDEA

The IntelliJ IDE supports Maven out of the box and offers a plugin for Scala development.

IntelliJ download: https://www.jetbrains.com/idea/
IntelliJ Scala Plugin: http://plugins.jetbrains.com/plugin/?id=1347

Check out our Setting up IntelliJ guide for details.

Eclipse Scala IDE

For Eclipse users, we recommend using Scala IDE 3.0.3, based on Eclipse Kepler. While this is a slightly older version, we found it to be the version that works most robustly for a complex project like Flink.

Further details, and a guide to newer Scala IDE versions can be found in the How to setup Eclipse docs.

Note: Before following this setup, make sure to run the build from the command line once (mvn clean install -DskipTests, see above)

Download the Scala IDE (preferred) or install the plugin to Eclipse Kepler. See How to setup Eclipse for download links and instructions.
Add the "macroparadise" compiler plugin to the Scala compiler. Open "Window" -> "Preferences" -> "Scala" -> "Compiler" -> "Advanced" and put into the "Xplugin" field the path to the macroparadise jar file (typically "/home/-your-user-/.m2/repository/org/scalamacros/paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar"). Note: If you do not have the jar file, you probably did not run the command line build.
Import the Flink Maven projects ("File" -> "Import" -> "Maven" -> "Existing Maven Projects")
During the import, Eclipse will ask to automatically install additional Maven build helper plugins.
Close the "flink-java8" project. Since Eclipse Kepler does not support Java 8, you cannot develop this project.

Support

Don’t hesitate to ask!

Contact the developers and community on the mailing lists if you need any help.

Open an issue if you found a bug in Flink.

Documentation

The documentation of Apache Flink is located on the website: http://flink.apache.org or in the docs/ directory of the source code.

Fork and Contribute

This is an active open-source project. We are always open to people who want to use the system or contribute to it. Contact us if you are looking for implementation tasks that fit your skills. This article describes how to contribute to Apache Flink.

About

Apache Flink is an open source project of The Apache Software Foundation (ASF). The Apache Flink project originated from the Stratosphere research project.

7452 questions

163

votes

4 answers

What is/are the main difference(s) between Flink and Storm?

Flink has been compared to Spark, which, as I see it, is the wrong comparison because it compares a windowed event processing system against micro-batching; Similarly, it does not make that much sense to me to compare Flink to Samza. In both cases…

apache-storm apache-flink flink-streaming

asked Jun 07 '15 at 22:29

fnl

4,861
4
27
32

115

votes

3 answers

What are the benefits of Apache Beam over Spark/Flink for batch processing?

Apache Beam supports multiple runner backends, including Apache Spark and Flink. I'm familiar with Spark/Flink and I'm trying to see the pros/cons of Beam for batch processing. Looking at the Beam word count example, it feels it is very similar to…

apache-spark apache-flink apache-beam

asked Apr 24 '17 at 06:26

bluenote10

23,414
14
122
178

votes

4 answers

difference between exactly-once and at-least-once guarantees

I'm studying distributed systems and referring to this old question: stackoverflow link I really can't understand the difference between exactly-once, at-least-once and at-most-once guarantees, I read these concepts in Kafka, Flink and Storm and…

cassandra apache-kafka apache-storm apache-flink

asked May 26 '17 at 15:15

Akinn

1,896
4
23
36

votes

4 answers

could not find implicit value for evidence parameter of type org.apache.flink.api.common.typeinfo.TypeInformation[...]

I am trying to write some use cases for Apache Flink. One error I run into pretty often is could not find implicit value for evidence parameter of type org.apache.flink.api.common.typeinfo.TypeInformation[SomeType] My problem is that I cant really…

scala apache-flink flink-streaming

asked Jun 20 '16 at 10:21

jheyd

votes

2 answers

Apache Flink vs Apache Spark as platforms for large-scale machine learning?

Could anyone compare Flink and Spark as platforms for machine learning? Which is potentially better for iterative algorithms? Link to the general Flink vs Spark discussion: What is the difference between Apache Spark and Apache Flink?

machine-learning apache-spark apache-flink

asked Apr 21 '15 at 18:50

Alexander

votes

3 answers

What is the difference between mini-batch vs real time streaming in practice (not theory)?

What is the difference between mini-batch vs real time streaming in practice (not theory)? In theory, I understand mini batch is something that batches in the given time frame whereas real time streaming is more like do something as the data arrives…

apache-spark batch-processing apache-flink data-processing stream-processing

asked Sep 27 '16 at 04:08

user1870400

6,028
13
54
115

votes

3 answers

Apache Flink - Difference between Checkpoints & Save points?

Can someone please help me understand the difference between Apache Flink's Checkpoints & Savepoints. While i read the documentation, couldn't understand the difference! :s

apache-flink

asked Aug 10 '17 at 03:30

Raja

votes

2 answers

Some puzzles for the operator Parallelism in Flink

I just got the example below for the parallelism and have some related questions: The setParallelism(5) is setting the Parallelism 5 just to sum or both flatMap and sum? Is it possible that we can set the different Parallelism to different…

apache-flink flink-streaming

asked Jun 08 '17 at 12:56

YuFeng Shen

1,475
1
17
41

votes

3 answers

Flink webui when running from IDE

I am trying to see my job in the web ui. I use createLocalEnvironmentWithWebUI, code is running well in IDE, but impossible to see my job in http://localhost:8081/#/overview val conf: Configuration = new Configuration() import…

apache-flink flink-streaming

asked Oct 28 '17 at 10:05

GermainGum

1,349
3
15
40

votes

1 answer

How to implement HTTP sink correctly?

I want to send calculation results of my DataStream flow to other service over HTTP protocol. I see two possible ways how to implement it: Use synchronous Apache HttpClient client in sink public class SyncHttpSink extends…

apache-flink flink-streaming

asked Mar 25 '16 at 11:58

Maxim

9,701
5
60
108

votes

1 answer

How to output one data stream to different outputs depending on the data?

In Apache Flink I have a stream of tuples. Let's assume a really simple Tuple1. The tuple can have an arbitrary value in it's value field (e.g. 'P1', 'P2', etc.). The set of possible values is finite but I don't know the full set beforehand…

java apache-flink flink-streaming

asked Oct 29 '15 at 12:22

Jan Thomä

13,296
6
55
83

votes

1 answer

Combine two streams in Apache Flink regardless on window time

I have two data streams that I want to combine. The problem is that one data stream has a much higher frequency than the other and there are times where one stream is not receiving events at all. Is it possible to use the last event from the one…

join stream streaming apache-flink

asked Sep 02 '17 at 14:27

FLoppix

votes

5 answers

Kafka Client Timeout of 60000ms expired before the position for partition could be determined

I'm trying to connect Flink to a Kafka consumer I'm using Docker Compose to build 4 containers zookeeper, kafka, Flink JobManager and Flink TaskManager. For zookeeper and Kafka I'm using wurstmeister images, and for Flink I'm using the official…

docker docker-compose apache-flink

asked Feb 24 '19 at 15:03

Mahmoud Sultan

votes

2 answers

Apache Flink: java.lang.NoClassDefFoundError

I'm trying to follow this example but when I try to compile it, I have this error: Error: Unable to initialize main class com.amazonaws.services.kinesisanalytics.aws Caused by: java.lang.NoClassDefFoundError:…

java amazon-web-services apache-kafka apache-flink noclassdeffounderror

asked Jan 09 '19 at 08:48

IoT user

1,222
4
22
49

votes

5 answers

Could not resolve substitution to a value: ${akka.stream.materializer} in AWS Lambda

I have a java application, in which I'm using the Flink Api. So basically what I'm trying to do with the code, is to create two Datasets with few records and then register them as two tables along with the necessary fields. DataSet comp =…

java amazon-web-services akka aws-lambda apache-flink

asked Feb 21 '18 at 11:35

Kulasangar

9,046
5
51
82

2 3

…

99 100 Next