Kafka Streams: Understanding groupByKey and windowedBy

Asked Mar 13 '19 at 12:53

Active Apr 29 '23 at 09:48

Viewed 482 times

I have the following code. My goal is to group messages by a given key and a 10 second window. I would like to count the total amount accumulated for a particular key in the particular window.

I read that I need to have caching enabled and also have a cache size declared. I am also forwarding the wall clock to enforce the windowing to kick in and group the elements in two separate groups. You can see what my expectations are for the given code in the two assertions.

Unfortunately this code fails them and it does so in two ways:

it sends a result of the reduction operation each time it is executed as opposed to utilizing the caching on the store and sending a single total value
windows are not respected as can be seen by the output

Can you please explain to me how am I misunderstanding the mechanics of Kafka Streams in this case?

edited Apr 29 '23 at 09:48

asked Mar 13 '19 at 12:53

zaxme

1,065
11
29

2

I guess you should use the suppress() method. https://kafka.apache.org/21/documentation/streams/developer-guide/dsl-api.html#id31 – user152468 Mar 13 '19 at 13:02
Thank you, @user152468. It seems that for some reason the Scala DSL for the Kafka Streams doesn't expose the `suppressed` method. I am going to try and integrate it with `.inner`. – zaxme Mar 13 '19 at 13:18
Adding suppress() to Scala API is WIP: https://issues.apache.org/jira/browse/KAFKA-7778 – Matthias J. Sax Mar 17 '19 at 17:24
You might be interested in: https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/ – Matthias J. Sax Mar 17 '19 at 17:26
Reference links seems to be broken, please refer to https://stackoverflow.com/questions/51779405/understanding-kafka-stream-groupby-and-window?rq=2 instead – frblazquez Apr 28 '23 at 07:37
I changed GitHub aliases. Links should work now. – zaxme Apr 29 '23 at 09:48

Kafka Streams: Understanding groupByKey and windowedBy

0 Answers0