1

Is there a way to count the number of unique words in time window stream with Flink Streaming? I see this question but I don't know how to implement time window.

FlinkNoob
  • 131
  • 1
  • 7

1 Answers1

0

Sure, that's pretty straightforward. If you want an aggregation across all of the input records during each time window, then you'll need to use one of the flavors of windowAll(), which means you won't be using a keyedstream, and you can not operate in parallel.

You'll need to decide if you want tumbling windows or sliding windows, and whether you are operating in event time or processing time.

But roughly speaking, you'll do something like this:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.addSource( ... )
    .timeWindowAll(Time.minutes(15))
    .apply(new UniqueWordCounter())
    .print()
env.execute()

Your UniqueWordCounter will be a WindowFunction that receives an iterable of all the words in a window, and returns the number of unique words.

On the other hand, if you are using a keyedstream and want to count unique words for each key, modify your application accordingly:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.addSource( ... )
    .keyBy( ... )
    .timeWindow(Time.minutes(15))
    .apply(new UniqueWordCounter())
    .print()
env.execute()
David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • but the probleme is: for each key I need to know the unique value, so I need to use "keyBy" – FlinkNoob Oct 26 '17 at 16:23
  • Apply is too much generic, there is not a way to do it without apply ? – FlinkNoob Oct 27 '17 at 16:33
  • Or if you can shared an exemple of code like the equivalent of "sum" function but with "apply" function ? – FlinkNoob Oct 27 '17 at 17:35
  • It's easy to run flink in IntelliJ and do some experiments. That's a pretty effective way to figure out how these various functions work. You may find this site helpful for getting started: http://training.data-artisans.com/ – David Anderson Oct 27 '17 at 18:48