0

I have the following code. My goal is to group messages by a given key and a 10 second window. I would like to count the total amount accumulated for a particular key in the particular window.

I read that I need to have caching enabled and also have a cache size declared. I am also forwarding the wall clock to enforce the windowing to kick in and group the elements in two separate groups. You can see what my expectations are for the given code in the two assertions.

Unfortunately this code fails them and it does so in two ways:

  1. it sends a result of the reduction operation each time it is executed as opposed to utilizing the caching on the store and sending a single total value
  2. windows are not respected as can be seen by the output

Can you please explain to me how am I misunderstanding the mechanics of Kafka Streams in this case?

zaxme
  • 1,065
  • 11
  • 29
  • 2
    I guess you should use the suppress() method. https://kafka.apache.org/21/documentation/streams/developer-guide/dsl-api.html#id31 – user152468 Mar 13 '19 at 13:02
  • Thank you, @user152468. It seems that for some reason the Scala DSL for the Kafka Streams doesn't expose the `suppressed` method. I am going to try and integrate it with `.inner`. – zaxme Mar 13 '19 at 13:18
  • Adding suppress() to Scala API is WIP: https://issues.apache.org/jira/browse/KAFKA-7778 – Matthias J. Sax Mar 17 '19 at 17:24
  • You might be interested in: https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/ – Matthias J. Sax Mar 17 '19 at 17:26
  • Reference links seems to be broken, please refer to https://stackoverflow.com/questions/51779405/understanding-kafka-stream-groupby-and-window?rq=2 instead – frblazquez Apr 28 '23 at 07:37
  • I changed GitHub aliases. Links should work now. – zaxme Apr 29 '23 at 09:48

0 Answers0